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Satisfaction With Nursing 


Helen Nahm 
Duke University 


There is growing recognition of the close relationship which exists 
between the extent of satisfaction within a professional group and the 
quality of service which that group renders to society. Within recent 
years nursing service administrators, hospital administrators, doctors, 
and others have become increasingly concerned about the apparent dis- 
satisfaction of many graduate nurses. Faculty members in schools of 
nursing have frequently noted that students, who at the time of admission 
seem highly motivated and very enthusiastic about their chosen profes- 
sion, undergo a gradual change, and, by the time of graduation, are too 
frequently a dissatisfied and, at timcs, a dissilusioned group of young 
women. Faculty members have believed that, if it were possible to create 
an environment in a school of nursing in which the enthusiasm and high 
motivation of students might be preserved, the dissatisfaction after 
graduation might be materially reduced. 


Studies on Satisfaction With Nursing 


A number of studies have been made on the satisfaction of graduate 
nurses and the factors which are associated with it (1, 2, 3,5). During 
the Spring of 1944 a study was made by the writer (6) on the satisfaction 
of 428 senior students in 12 schools of nursing in Minnesota. Findings 
of this study indicated that 85.6 per cent of the students either liked or 
were enthusiastic about nursing, 13 per cent were indifferent, and only 
1.4 per cent disliked it. However, the number who either liked or were 
enthusiastic about nursing varied from 66 per cent in one school to 97 
per cent in another. Significant differences between mean scores on a 
nursing satisfaction scale also indicated that students in some schools of 
nursing are much better satisfied with nursing than those in others. 

From reactions of students to a number of questionnaire items de- 
signed to discover factors associated with satisfaction or dissatisfaction, 
it seemed evident that satisfaction was associated with a liking for bed- 
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side care of patients, and a feeling that the nursing school program had 
been well-planned and that students had had adequate experience in the 
major nursing services. Other factors which were associated with satis- 
faction were the kinds of relationships which were established with faculty 
members, head nurses, doctors, and others; provisions which were made 
for the welfare of students; opportunities to help plan work, to use initia- 
tive, and to express ideas on hospital divisions; and opportunities to ad- 
vance and to earn adequate salaries after graduation. Dissatisfied 
students were more likely to complain of chronic fatigue, and to feel that 
classes were dull and boring. 

When an attempt was made to relate reactions to various question- 
naire items to mean scores on the Nursing Satisfaction Scale it seemed 
evident that, in some of the schools in which many unsatisfactory con- 
ditions were present, students were, strangely enough, highly satisfied. 
In other schools in which many provisions were made for the welfare and 
happiness of students, they were not so well satisfied. Comparisons 
which were made between high and low satisfaction groups in three of 
the schools of nursing also indicated that the factors which are associated 
with satisfaction probably vary from one situation to another. For ex- 
ample, in two of the schools satisfied groups had more satisfactory rela- 
tionships with head nurses and supervisors, and were better satisfied with 
provisions which were made for student welfare. In the third school 


these factors did not distinguish between high and low groups. In one 
school the dissatisfied group was of significantly higher intellectual ability 
than the satisfied group. In the other two schools the ability level of 
high and low groups was about the same. It seems probable, from these 
findings, that, in each school of nursing, there is a complex set of interre- 
lationships between conditions which are actually present in the environ- 
ment and willingness of students to accept and adjust to such conditions. 


General Plan of Present Study 


During the Spring of 1947 a study was made of the satisfaction of 
three groups! of students who were enrolled in the Duke University School 
Nursing. The groups included 52 students who were enrolled in the 
freshman class, 62 students who were enrolled in the junior class, and 70 
students who were enrolled in the senior class, The primary purpose of 
the study was to determine whether there were differences among 
groups of students both in the extent of satisfaction with nursing and in 
the factors which are associated with it. 

1]t is customary in schools of nursing which offer three-year programs to refer to 


first-year students as freshmen, second-year students as juniors, and third-year students 
as seniors. 
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To measure satisfaction with nursing an adaptation of the Hoppock 
(4) Job Satisfaction Scale was used. This Scale is made up of four sec- 
tions, with seven possible responses to each section. Responses range 
from those indicating a high degree of satisfaction with nursing to those 
indicating a high degree of dissatisfaction. Each section is scored from 
1 to 7 and total scores may, therefore, range from 4 to 28. 

To determine factors associated with satisfaction and dissatisfaction 
students were asked to respond to 87 items designed to measure reactions 
to working and living conditions; relationships which were established 
with faculty members, head nurses, doctors, and patients; organized class 
work and clinical experience; and general provisions which were made for 
the welfare of students. Freshman students were asked to list the three 
things which they liked best about the Duke University School of Nursing. 
All students were asked to list the problems or difficulties which had been 
of most concern to them since entering nursing training and to give sug- 
gestions for improvement. 

In presenting findings of this study a few explanations seem in order. 
The study on the Duke University School of Nursing students was made 
during the Spring of 1947, a period in which the stresses and tensions in 
hospitals were probably as great as or even greater than at any time dur- 
ing the period of World War II. Furthermore, in the year preceding the 
time the study was made, a number of individuals in key faculty posi- 
tions had resigned, and had been replaced by individuals who were un- 
familiar to junior and senior students in the school. Attitudes of these 
students, therefore, probably reflect both the stresses and tensions of a 
busy post-war period and also the insecurity which students are likely to 
feel when widespread faculty changes are made. 

Freshman students entered the school of nursing at about the time 
that many of the new faculty members assumed their respective positions, 
and, therefore, accepted these new people without question. At the time 
the study was made the freshman students had been in the school of nurs- 
ing for nine months. Though they had had heavy class work during this 
period, they had not been required to carry heavy responsibilities on 
hospital divisions. Furthermore, the experiences which they had had in 
the actual care of patients were carefully selected by their instructors and 
well supervised. 


General Findings of the Study 


Satisfaction scores of the three groups of students of the Duke Uni- 
versity School of Nursing and of the group of 428 students from 12 
schools of nusing in Minnesota (6) are givenin Table 1. It seems evident, 
from this table, that the freshman students of the Duke University 
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Table 1 
Satisfaction With Nursing of Students in Schools of Nursing 








Students from 12 Students of the Duke University 
Schools of Nursing School of Nursing 
aly Per Cent of Total Group 
Per Cent of Seniors Juniors Freshmen 
Total Group N =70 N = 62 N = 52 








Enthusiastic (24-28) 24 
Likes it (20-23) 61.6 
Indifferent (16-19) 13 
Doesn’t like it (12-15) 14 
Dislikes it ( 8-11) 0 
100.0 





School of Nursing were much better satisfied with nursing than either 
junior or senior students in the same school, or the senior students from 
Minnesota schools. 


Mean scores and standard deviations of the various groups on the 
Nursing Satisfaction Scale are given in Table 2. The difference between 


Table 2 
Mean Scores and Standard Deviations of Students on the Nursing Satisfaction Scale 








Number of 8.D. of 
Students Mean Distribution 





Students from 12 schools of 

nursing in Minnesota 428 21.8 2.86 
Students of the Duke Uni- 

versity School of Nursing 
Seniors 70 21.0 2.90 
Juniors 62 21.6 2.35 
Freshmen 52 23.2 1.85 





means of Duke junior and senior students is not significant (¢ = 1.83). 
The difference between the mean of the Duke freshman students and that 
of the combined? junior and senior group (¢ = 7.91) indicates that the 
freshman students were much better satisfied with nursing than were the 
more advanced groups. Differences between means indicate that the 
group of 428 senior students from 12 schools of nursing in Minnesota were 
better satisfied than the combined junior and senior group of the Duke 
University School of Nursing (t = 3.64), but were not so well satisfied 
as the Duke freshman group (¢ = 5.57). 


2 The mean of the combined junior and senior group is 21.3 and the standard devia- 
tion is 2.68. 





Satisfaction With Nursing 
Factors Associated With Satisfaction in Nursing 


Almost all students of the Duke University School of Nursing (93 to 
100 per cent) said that they enjoyed bedside care of patients. About 
50 per cent stated that they had opportunities to use their initiative on 
hospital divisions, but only about 20 per cent to help plan work or to ex- 
press their ideas. About 75 per cent felt that ratings on ward work were 
usually fair. Ninety per cent liked the majority of the head nurses, 
teachers, and supervisors, and 75 per cent said that head nurses and 
supervisors usually approved of their work. Eighty to 90 per cent were 
satisfied with the food service and living quarters. About 75 per cent 
felt that they had had adequate care during major illnesses and 60 per 
cent during minor illnesses. 

Percentage differences which are significant at the 1 per cent level 
indicated that junior students, in comparison with seniors, were more 
likely to feel that patients received inadequate care; that doctors de- 
manded too much of nurses; that work on hospital divisions was too 
heavy, and that there was always so much to do that they couldn’t do 
satisfactory work; and that needs of students were subordinated to needs 
of the hospital. A significantly higher proportion of junior than senior 
students complained of chronic fatigue and backache. 

When comparisons were made between the freshman students of the 


Duke University School of Nursing and the combined junior and senior 
group, percentage differences which are significant at the 1 per cent level 
indicate that the freshman students were: 


less likely to feel that patients received inadequate care; 

more poetearet in helping patients solve their mental and emotional 
prvblems; 

more inclined to feel that patients liked them and the care which they gave; 

less inclined to feel that work on hospital divisions was too heavy; 

wgtems | - feel that needs of students were subordinated to needs of the 

ospital; 

less inclined to feel that favoritism was shown toward some students; 

more likely to say that head nurses and supervisors were good about 
helping them with things they did not understand; 

more likely to enjoy life in a nurses’ residence; 

more inclined to feel that they had reasonable freedom to do as they liked; 

less likely to say that students had little opportunity to participate in 
making and enforcing rules under which they were governed; 

more likely to say that they frequently had contact with people outside the 
hospital and nurses’ residence; 

less inclined to feel that many of their classes were dull and boring; 

more likely to feel that they had had adequate instruction in personal and 
mental hygiene; 

more likely to say that they had opportunities to practice good mental 
hygiene on hospital divisions and in the nurses’ residence; 

more inclined to feel that they had adequate time for rest and sleep; 
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less likely to ———— of chronic fatigue and irritability; 

more likely to feel that the social program was adequate; and 

oo to feel that the entire nursing school program had been well 
plann 


Problems of Students 


Problems of Junior and Senior Students. The problem listed more 
frequently by junior and senior students than any other was lack of suffi- 
cient time to give adequate care to patients. This was attributed to 
heavy assignments and understaffed wards. The students also com- 
plained of lack of sufficient time for study, recreation, and sleep. They 
disliked having to attend classes during the time they were on night duty. 
Junior students were more concerned than seniors about class work. 

A few direct comments of junior and senior students which seem best 
to illustrate their major problems are listed as follows: 

Because of my grades and attitudes I was constantly afraid of being dis- 

, particularly during the first year. 


I don’t like being criticized about my deficiencies without being allowed to 
make any explanation, or being given any assistance in overcoming such 
deficiencies. 


Faculty members have accused me of not being interested in my work. 
That’s rather silly, I think, because I am, else why should I be here? 


I sometimes wonder whether I am in the right profession. At times I get 
discouraged and ready to quit. 


How can we learn to tolerate and understand those with less experience 
when those with more experience seldom try to understand us. 


We are given adult responsibilities on hospital divisions, yet placed on a 
child’s basis in the nurses’ residence. 


Head nurses expect us always to seem busy. When you stop to chat with 
a patient they think you are loafing. Sometimes patients need this more 
than any actual nursing care you can give. 


One of, the supervisors told a student she was just here to work and stud 
and should not expect anything else. At this point that is the way I feel. 
We are just here to serve the needs of the hospital and nothing else. 


I too frequently feel insecurity rising up inside me. 

I used to be healthy. Now I am just plain tired and badly in need of a 

vacation. 

Problems of Freshman Students. Freshman students listed almost no 
problems in connection with work on hospital divisions. About 25 per 
cent complained of lack of time for study and lack of knowledge of how to 
study. About 35 per cent felt that they lacked social skills which are 
necessary to feel at ease in social situations. Thirteen per cent were 
sensitive about their personal appearance, and an equal number felt that 
they lacked personal qualities which are needed for success in nursing. 
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Suggestions for Improvement 


Suggestions of Junior and Senior Students. Typical suggestions made 
by junior and senior students for improvements in the nursing school pro- 
gram are stated as follows: 

We need shorter and better planned hours, particularly during the second 

year when students have heavy class work. 


More nurses are needed so that patients can be given adequate care. Make 
it possible for students to spend more time with patients. 


More competent head nurses and supervisors are needed; individuals who 
are interested in students and their problems, and who know how to super- 
vise work on hospital divisions. 


Plan for more teaching on the wards where it will really soak in. Give us 
more information about patients, and make it possible for us to attend 
conferences with doctors. 


Improve corrective methods when we make mistakes. If criticisms are 
to be made, talk to the student about them before turning in her record to 
the nursing school office. 


Make courses more complete and interesting. Have more panel discussions 
and better organized, more interesting lectures. 


Place more stress on mental hygiene and good public relations. 
Have more lectures on current topics of the day. 


We need an adviser to help us with our social, personal, and emotional 
problems; someone who knows us individually, and who can help us under- 
stand our weaknesses. 


We need more freedom and liberty to govern ourselves. Treat us as 

adults, not as children. 

Suggestions of Freshman Students. Most of the suggestions for im- 
provement which were given by freshman students had to do with organ- 
ized class work. They felt that some of their courses should be better 
organized, and that some of their teachers should be better prepared for 
teaching. Of 41 separate comments which were made about experiences 
on hospital divisions, 39 were favorable. The students felt that the ex- 
perience on hospital divisions was very interesting, very beneficial to 
them; that it gave them a feeling of personal worthwhileness, as well as a 
very great deal of satisfaction. 


Attitudes of Freshman Students Toward the School of Nursing Program 


In response to the question “‘What do you like best about the Duke 
University School of Nursing?”’, 40 per cent of the students mentioned the 
interested and understanding instructors, supervisors, and head nurses; 
34 per cent, the satisfaction of working on hospital divisions; 32 per cent, 
the friendly atmosphere; 30 per cent, the attractive residence; and 20 per 
cent, the well organized and well-taught courses. From 10 to 20 per cent 
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mentioned the broad experience which students have on hospital divisions, 
the opportunity of working with experienced and highly qualified doctors, 
the democratic student organization, and the access to college life. 


Summary and Conclusions 


Findings of this study suggest that there is a sharp decrease in satis- 
faction with nursing as students progress from the freshman to the junior 
year. At the end of a period of nine months in the school of nursing, the 
freshman students were still a highly motivated and enthusiastic group. 
Junior and senior students, on the other hand, showed many evidences of 
tension and frustration. Comparisons between the two latter groups 
indicated that junior students have more problems in connection with 
class work and with work on hospital divisions than do seniors. The 
many favorable comments which the freshman students made about the 
entire school of nursing program seem in marked contrast to the rather 
unfavorable comments made by the more advanced groups. It seems 
probable, from this finding, that the unfavorable comments are, in part, 
a reaction against the ever-increasing responsibilities which students in 
nursing are expected to assume as they progress through the school of 
nursing. 

Suggestions which the students made for improvement in the school 
of nursing program seem, on the whole, to be thoughtful and reasonable. 
These suggestions indicate that, if more frequent attempts were made to 
determine how students feel and what they believe, the result would be of 
value to both the individual student and to the school of nursing. 

As stated in a foregoing section of this paper, a number of faculty 
changes had been made at the Duke University School of Nursing in the 
year preceding the time the study was made. Attitudes and comments of 
junior and senior students, in comparison with those of freshmen, give 
some indication of the effect upon students of widespread faculty changes. 

In presenting findings of this study it is recognized that a follow-up 
study on one group of students as that group progresses through the 
school of nursing would be of more value than comparisons among three 
separate groups. Plans have been made to do a follow-up study on the 
freshman group. Findings of this project suggest, however, that the 
source of the dissatisfaction and disillusionment of the graduate nurse may 
be traced, at least to a very great extent, to the experiences which that 
nurse has had as a student in a school of nursing. It seems probable that 
the situation in nursing will not improve markedly until it becomes pos- 
sible for students to continue to feel about nursing much as the Duke 
University School of Nursing freshmen did at the end of a period of nine 
months in the school of nursing. To achieve this result it would seem 
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essential that steps be taken to relieve students of the heavy responsi- 
bilities which they carry at the present time, and to make it possible for 
them to be students of nursing rather than primarily hospital workers. 
However, the problem of staffing hospitals at this period is a crucial one 
which can only be solved through the combined efforts of the hospital ad- 
ministrators, doctors, the schools of nursing themselves, and the public. 
There is, at present, considerable interest in hospitals and schools of 
nursing, but, as yet, much too little real understanding of the problems 
which these institutions face. If studies on the attitudes and satisfactions 
of students in nursing can contribute to an understanding of these prob- 
lems, perhaps the support which is needed to solve them will follow in 
due time. 


Utilizing Findings of this Study 


When studies have been made within a school of nursing it is essential 
that the findings be utilized as soon as possible. Reports of this study 
have been made available to a number of individuals in the Duke Uni- 
versity School of Nursing, the Hospital, and the Medical School who are 
vitally concerned with the welfare of students in nursing. Comments 
of these individuals have indicated that they are interested in the findings 
of this study, and anxious to make changes which are needed to help 


students achieve greater satisfaction in nursing. The students who 
participated in the study have been interested in the findings. It is 
believed that the opportunity of expressing opinions and of making sug- 
gestions for improvement, in itself, has had a therapeutic effect. In 
addition, students have been stimulated to think seriously about their 
own attitudes and personal characteristics, and to realize that they too 
have a vital stake in the future welfare of the nursing profession. 


Received January 6, 1948. 
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How Better Personnel Selection Can Reduce Factory Costs * 


William James Giese 
William James Giese, Ph.D., and Associates, Chicago, Illinois 


A. The primary purpose of this analysis is to determine whether or not 
psychological aids for the selection and placement of personnel will save 
more money than the installation and maintenance of such aids will cost. 

B. If such aids can affect significant savings, the secondary purpose 
is to outline in general terms the scope of a psychological testing program 
which will meet the needs of the “A” Company. 

C. A further supplementary purpose is to make any observations or 
comments which may be valuable that are indicated by the study even 
though they do not bear directly upon A or B above. 


Method 


A. All of the jobs in the non-exempt category were studied by the 
psychologist for the purpose of grouping the jobs into general capacity 
families (i.e. groups of jobs in which tie capacities required to learn the 
work are generally similar). 


. This step was accomplished through a study of the job descriptions. 


. All of the jobs in the shop job families were checked with the Time Study 
and Routing Supervisor. 


1 

2 

3. All the jobs in the office job families were checked with the Personnel 
Director and the Assistant to the Controller. 

4 


. Some minor changes were made in a few of the groupings in the shop 
job families as a result of these checks. 


5. With some of the shop jobs the work operations were observed jointly. 


B. For those shop job families for which such data were available both 
the consistency of individual productiveness and the spread of productive- 
ness of various groups of employees were analyzed. 


1. Only those employees who were on piece work 70% or more of the time 
for pay periods 1-7 and 8-14 (1947) were included. 





* This is the body of a longer report entitled “A Psychological Analysis of Personnel 
Requirements from the Standpoint of the Practicability of Psychological Aids for the 
Selection and Placement of Personnel,” submitted to the major executives of Company A. 
For the purposes of publication, the number of exhibits had to be reduced and the 
detailed appendices eliminated. 
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C. General burden costs and retainer charges,' considered in con- 
nection with the spread of productivity among employees and break even 
points for the selective aids, were computed for various increases in better 
selection and placement of employees for different selection ratios. 


1. A selection ratio is the number of persons hired to the number con- 
sidered. 


D. A study of the terminations for the most recent twelve month 
period was made for each skill family. 


1. These percentages were used in relationship with other data in deter- 
mining the practicability of psychological selection and placement aids. 


E. Such factors as merit ratings, spoilage, reworks, accidents, etc. 
were not used in this study because: 


1. Such data were not readily available. 
2. There were sufficient data for the purposes of this study without them. 


Results 


A. As a result of studying the job descriptions and in some cases ob- 
serving the work being performed, all of the non-exempt jobs were classi- 
fied into 18 general skill groups on the basis of the capacities necessary 
to learn the work. 


1. There are 7 skill groups for the shop jobs. 
2. There are 9 skill groups for the office jobs. 
3. For both the office and shop there is a miscellaneous classification for 
those jobs which did require certain capacities, but because the number 
of employees doing the work was small, it was impractical to set up a 
separate skill family. 
4. For both the office and shop there is an unskilled classification for those 
jobs which require very little capacity to learn. 
5. Table 1 gives the code, the name of the skill family, number of persons 
in each family, and the number of terminations in each family. 
a. Appendix I? lists for each of the shop skill families: 
i. The selection requirements. 
ii. The job titles included in the skill family. 
iii. The number of persons in each job as of May 1, 1947. 
b. Appendix IT lists the same information for the office jobs. 


B. There were sufficient data available to analyze the productiveness 
of four of the 18 skill families. 
C. The percentage of anticipated earned rate*® is a reliable measure 





1 The difference between the minimum guaranteed base rate and amount earned on 
piece rate. 

* The original report contained these details in Appendices I and II. 

* The anticipated earned rate is the amount of money an employee who has com- 
pleted the learner stage should earn on standard piece work. This is roughly equivalent 
to about 70-80% of leveled standard output. 
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Table 1 


Terminations by Skill Families for the Period of May 1946 to 
May 1947 (52 week period) 





Number Termina- 

Number of tions as 
of Termina- Per Cent 

Skill Family Persons tions of Total 





Craftsmen mechanics 114 24 21% 
High mechanical 152 38 25% 
Average mechanical 363 124 34% 
Low average mechanical 213 334 157% 
Molder power squeezer 44 150 341% 
Low average mechanical (excl. MPSq.) 169 184 109% 
Light to medium manual dexterity 176 261 148% 
Inspectors castings E and F 52 265% 
MDL (excluding insp. cast. E and F) 124 99% 
Overhead and crawler crane operators 21 52% 
Shop clerical 149 76% 
Unskilled (shop and office) 335 127% 
General clerical 156 56% 
Clerical—computational 61 46%, 
Clerical—typing 10 80% 
Computational 29 38% 
Computational—typing 12 50% 
Secretarial 37 78% 
Administrative 46 ll 2% 
Engineering technicians and draftsmen 61 12 20% 
Special trainees 79 19 24% 
Misc Miscellaneous (shop 36; office 55) 91 57 63% 





upon which to base other computations such as costs, validation of various 
selection techniques, etc. 


1. The correlations and other data appear in Table 2. 


D. There is a wide variation in the productivity of people in each skill 
family. This is shown by the sigma figures in Table 2; this fact is graphi- 
cally illustrated in Figure 1 which presents the average percentage of antic- 
ipated earned rate of each individual studied for the first 14 pay periods 
of 1947. 


1. The highest producer turns out from 24 to 3.7 times more work than 
the lowest producer. 

2. The par 32% of producers turn out from 1.4 to 14 times more work 
than the lower 32% of producers. 


3. There are some small differences between the four groups in the amount 
of total spread of productivity, but it is so small that it is not significant. 


E. It is interesting to note (Table 2 and shown graphically on Figure 
1) that mean productiveness varies fron 107% to 127% when the pro- 
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Table 2 


The Reliability and Related Facts Concerning the Percentage of Anticipated Earned 
Rate as Meaningful Data for the Purposes of this Study 








Skill Family 
Ml M2 M3 


N 63 125 81 65 
Tu 985 92 88 88 
Tart 99 96 94 94 
Mean 114+2.5 122 + 2.4 127 + 2.4 107 + 2.7 
Sigma 20+ 18 244+1.5 22 + 1.75 22 + 1.95 
o(M) 2 4.6 7.6 7.6 











N = the number of persons in the study. 

tu = the Pearson Product-Moment Coefficient of correlation for pay periods 1-7 vs. 
8-14. 

rar = the estimated correlation for pay periods 1-14 vs. 15-28. 

Mean = the arithmetical mean. 

Sigma = a measure of variability—the range of the middle 68%. 

o(m) = the plus and minus to be kept in mind in regard to one individual’s percentage 
of anticipated earned rate. 


ductiveness of individuals is grouped according to the skill family in which 
they belong. 


1. Most of these differences are statistically significant. 

2. This difference in mean productivity for the different skill families may 
indicate some general differences in ‘‘looseness”’ or ‘‘tightness’”’ of stand- 
ards as established by the time study department. Training in leveling 
and observation for observer-analysts is indicated. 


F. Figure 2 graphically illustrates the differences in productivity of 
individuals within departments and compares the differences of produc- 
tivity between departments as a whole. 


1. In general the variability of productiveness is about as great as it was 
when studied by skill groups with the mance or, exceptions: 
a. Departments 3, 4, and 17 — somewhat less spread of productive- 
ness of employees, probably due to the type of work involved. 
i. oS Press Department is certainly a case of the type of 
work. 
ii. Even with the somewhat smaller variability, the ratio of high to 
low producer is 1.6, and the upper 32% of producers get out from 
25% to 30% more work than the lower 32% of producers. 
2. A most important fact illustrated by Figure 2 is that there are large 
differences between average productiveness of the departments. 
a. a differences are significantly greater than those between the skill 
amilies. 
b. Most people in department 4 have a higher productivity than most 
people in department 17. 
i. This and other differences might be examined by the time study 
department. 
ii. Such differences not only sow the seed for employees’ dissatisfac- 
tions but also negate the results of job evaluation. 
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G. Table 1 presents a summary of a study of terminations by skill 

groups. 

1. In general, the higher terminations rates are found in the low and un- 
skilled groups. 

a. Working conditions, wages, and related causes account for most of 
the terminations in these groups. 

b. In the higher skill groups in the office, the reasons for terminations 
tend to be personal—i.e., marriage, husband moves, household duties, 
children, etc. 

c. In the lower skill groups in the office jobs, the reasons for terminations 
are working conditions, wages, and related causes. 

d. The data for statements a, b, and c are not presented in table form, 
but they are available among the work sheets. 

2. With the present termination rates, any small improvement in selection 
and placement of personnel with even a large selection ratio and a low 
relationship between the selective techniques and the employees’ success 
would produce results which should be noticed in from 3 to 6 months. 

. Those areas with very high turnover should be separately studied with 
the aim of reducing those factors which must be intrinsic in the work as 
it is now set up. 


H. The relationship between the length of service of the employees 
and productiveness is zero. 

I. The estimated burden charges against wages paid on piece work is 
$2,043,280. 

1. The calculations for the above are among the work sheets. 


J. Since burden charges remain relatively constant as the productive- 
ness of people on piece work increases, Table 3 shows the savings in 


Table 3 


Savings in Burden Charges as a Result of Improved Selection and Placement of 
Personnel for Different Selection Ratios 
(A relationship between the psychological selection aids and productivity is assumed 
to be a correlation of .40) 








Per Cent Burden 

Overall Savings* 

Increase 100% 
in Product. Term. 





11.2 $228,847 
9.2 187,982 
155,289 

130,770 

85,818 

65,385 

44,952 

24,654 





* Based on the fiscal year of 1947, and estimated from the charges during the first 
25 weeks. 
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Table 4 


Reductions ef Retainer Payments as a Result of Improved Selection and Placement of 
Personnel for Different Selection Ratios 
(A relationship between the psychological selection aids and productivity is assumed 
to be a correlation of .40) 








Per Cent Payment Payment 

Decrease uctions* Reductions* 
in 100% 72% 

Retainers Term. Term. 





$30,143 $21,703 
26,794 19,292 
23,444 16,880 
20,095 14,468 
17,416 12,540 
14,736 10,610 
12,057 8,681 
9,378 6,752 
6,698 4,823 


SCeONDn ewe 





* Based on total payments for 1947 estimated from payments for the first 4 months. 


burden charges for different selection ratios assuming a relatively low 
relationship between the selection and placement aids and productivity. 


1. Such savings are predicated on a continued need for high production of 
the entire plant. 


K. The reductions possible in retainer payments for various selection 
ratios are presented in Table 4. 


1. The savings due to reductions in retainer payments do not depend upon 
continued high production in the entire plant. 

2. Since the retainer payments represent only a small fraction over one 
per cent of the general burden charges, savings effected on retainer pay- 
ments are over and above the savings made on the general burden 
charges. 


Table 5 


Estimated Cost of Installing and Maintaining a Sound Program of Psychological 
Aids for the Selection and Placement of Personnel 
(2000 persons per year assumed volume) 








Item First Year Second Year 





Consultants’ fees $4000 $1000 
Psychometrist’s salary 3000 3500 
Tests, supplies, etc. 1000 1000 
Rent, heat, depr. of equipment, etc. 750 750 


Total $6250 
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L. A breakdown and summary of the cost of installing and maintain- 
ing a sound program of psycholoigcal aids for the selection and placement 
of personnel are presented in Table 5. 


Interpretations and Recommendations 


A. As soon as the labor market will permit a rejection of 1 out of 10 
of the applicants for employment, psychological testing and other aids 
can result in a demonstratable saving of at least $12,000. 


1. These aids will not only be valuable for those employees on piece work 
but also for those employees on day work or salary. 

a. Although the results of selecting these persons with the use of psycho- 
logical aids do not lend themselves readily to cost and savings analysis, 
the benefits from them are none the less real and sizeable from a 
financial standpoint. 

. Improved selection from the standpoint of the capacities of indi- 
viduals to learn and do the work for those jobs on day work or salary 
can result in increased productivity and proficiency, but it depends 
to a large extent on the skill and effectiveness of the foreman or super- 
visor to see to it that these capacities are realized. 

. Persons selected with the help of psychological aids not only are capable 
of a higher level of production and proficiency but they also learn their 
jobs more rapidly. 

a. Although no cost data on the savings due to shorter learning or 
“breaking in’’ time are included in this study, such savings, in other 
instances, have been found to more than pay for the entire program. 

. An increase in the number of employees placed in work which matches 
their capacities better should result in a decrease in turnover. 

a. Persons in work which suits their capacities tend to find more satis- 
faction in their work. 

. The above (i.e., statement A) is based on the expectation of finding a 
validity coefficient of .40 between the psychological aids and produc- 
tiveness. 

a. It is probable that a higher coefficient will be obtained. 

b. A walidity coefficient of .50 (entirely possible) would increase the 
savings discussed in this study by 55%. 

c. A validity coefficient of .60 (fairly possible) would double the savings 
discussed in this study. 


B. It is recommended that a sound program of psychological testing 
of applicants for employment be installed. 


1. The present tight labor market will provide an excellent opportunity to 
give the psychological testing procedures a test run. 
a. At the present termination rate sufficient data can be gathered for 
most of the larger skill families within 3 to 9 months. 
b. This will make it possible to test and check the procedures thoroughly. 
i. Thus, when the labor market eases, tables based on experience with 
previously hired employees will allow the most efficient use of a 
more favorable selection ratio. 
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C. It is recommended that a psychological testing program for aiding 
in the selection and placement of new personnel of the following scope 
be installed: 


1. All applicants except those for jobs in the unskilled group be tested as 
indicated. 

a. Appendices I and II list the capacities and/or proficiencies to be 
measured as well as the teutative standards for each skill family. 

2. A — be employed to administer the tests and to do related 
work. 
° a. This would be about a half to two-thirds time job when processing 
2000 applicants a year. 
3. A system of follow-up must be an integral part of the program. 

a. Some measure—probably a good merit rating—is needed for those 
employees not mainly on piece work. 

b. If at all practicable, such data as spoilage, reworks, accidents, etc., 
should be part of the follow-up against which to measure the effective- 
ness of the psychological selection aids. 

c. Productivity should be used to measure the effectiveness of the 

rogram. 

d. This follow-up is part of the psychometrist’s job. 

4. The psychologist will engineer the installation of the program and will 
direct the follow-up work. 


Received December 16, 1947. 





Job Evaluation Simplified: The Utility of the Occupational 
Characteristics Check List * 


Roger M. Bellows and M. Frances Estep 


Department of Personnel Methods, School of Businese Administration, 
Wayne University 


Job evaluation can provide a valuable system of known dependability 
for agreements between workers and management on the delicate matter 
of employee pay rate schedules. That the present hit-or-miss systems 
of job evaluation are of value is suggested by their widespread use. How- 
ever, nearly all who have set up such systems will agree that much devel- 
opment and appraisal of methods is needed. Appraisal of the utility of 
job rating devices is of considerable significance looking toward improve- 
ment of job evaluation. 

Jay Otis conducted an unpublished study in which he found that a 
method for scoring the Occupational Characteristics Check List did not 
yield scores showing any significant relationship to evaluated points ob- 
tained from job evaluation. In the present paper further examination is 
made of the usefulness of the Occupational Characteristics Check List 
(OCCL)! in job evaluation. 

The OCCL was developed in 1935 at the Baltimore Center of the 
Occupational Research Program. It is a form for estimating what 
amounts of 47 or more traits or abilities are needed by the worker to do 
the job. A forerunner of the check list was developed by Viteles,? 
which he called a job psychograph. The OCCL was used by the Worker 
Analysis Section of the Occupational Research Program in developing job 


* The authors express appreciation to Mr. I. W. Winkelman, to the controller, the 
personnel director, and the department heads of his organization, and especially to 
Miss Eleanor Yunis who assembled the job descriptions and specifications used in 
training the committee members, and to the members of the job evaluation committee 
for their thoughtfulness in rating the jobs. 

1The Worker Characteristics Form (now called the Occupational Characteristics 
Check List) is shown, and its uses described in William H. Stead, Carroll L. Shartle, 
and Associates, Occupational counseling techniques. New York: American Book Com- 
pany, 1940, pp. 175-183. The same check list is discussed in Dale Yoder, Personnel 
management and industrial relations. New York: Prentice-Hall, Iac., 1942, pp. 103-106, 
and also in Jay L. Otis and Richard H. Leukart, Job evaluation. New York: Prentice- 
Hall, Inc., 1948, pp. 84-88. 

2 Morris 8. Viteles, Industrial psychology. New York: W. W. Norton and Company, 
1932, pp. 150-153. 
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families and in the Job Analysis Section of the Program in connection 
with job analysis of somewhat more than 10,000 occupations. It is a 
common technique in job study, but it probably has not been used very 
much in job evaluation. 

The present study describes an experiment in the evaluation of main 
office jobs in a women’s specialty store chain. The members of the job 
evaluation committee were trained to use job descriptions, job specifica- 
tions, and the OCCL to evaluate these jobs in terms of a simple job evalu- 
ation system. 


The Job Analysis and Evaluation 


In a women’s specialty store chain organization a program for job 
analysis and evaluation of 53 main office jobs was undertaken, for the 
purpose of agreeing upon fair and equitable salary rate ranges. The plan 
was presented to the employees through an employee memorandum set- 
ting forth the objectives of the program. It was also pointed out that a 
recent employee opinion survey had suggested the use of job analysis and 
job evaluation. The memorandum is shown below: 


To ALL EMPLOYEES: 


The company will begin in the near future a program of job analysis and 
evaluation of Main Office jobs, the purpose of which is to insure fair and 
equitable salary rate ranges. Job analysis and evaluation is a device for estab- 
lishing and maintaining rate ranges and is vor used among progressive retail 
firms today. Numerous responses in the Employee Opinion Survey in which 
you participated suggested the use of job analysis and evaluation. 

This program is intended to accomplish the following: 


1. Provide a complete record of the content of all jobs and their require- 
ments so as to assist the company in recruiting and training new employees for 
specific jobs; 

2. Provide a clear picture of the lines of progression within each department 
and between departments; 

3. Set up a sound relationship between all jobs in the office in terms of job 
knowledge, responsibilities, mental and skill requirements, working conditions, 
etc., associated with performance of work. Salary ranges set up as the result 
of the job evaluation will recognize differences in the above-mentioned require- 
ments. However, no individual earnings will be reduced as the result of this study; 

4. Provide the framework for a merit rating plan which will assure periodic 
and fair appraisal of the individual performances of all employees. 


A committee has been established which will carry on the work of job 
analysis and evaluation under the guidance of an outside specialist experienced 
in this work. The committee will consist of the following members: 


. Permanent employee member (selected by vote of employees) 

. One other employee from the particular department under review 
. Permanent department head member 

The department head from the department under review 

. Job analyst member 

. Personnel department member 

Top management member 

. Technical advisory member. 


00 NID OF CO DD 
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You will note that the committee is representative of the various levels of 
responsibility. 

The job analyst will prepare analyses of all jobs after talking with the 
department head and the employee. The committee will analyze these jobs 
with the participation of the employee members. After the jobs have been 
analyzed and approved by the committee, they will be evaluated and grouped 
into rate ranges by the committee. 

Job analysis and evaluation is not a speed-up or efficiency development 
procedure or a money-saving device. It will be to the mutual benefit of the 
company and its employees to establish a systematic process of maintaining 
wee 0g ranges in proper relationship to each other and to the requirements 
of the job. 

Since each of you will aid in the job analysis phase of the program, your 
whole-hearted cooperation is urgently requested. tiie 

ign 


Company President 
THe CoMMITTEE 
Committee signatures 


The next step in the program was indoctrination of the committee mem- 
bers. The general purposes and specific objectives of job analysis and 
evaluation were discussed. The duties and functions of the committee 
were clarified. In subsequent training sessions the use of the job analysis 
form was illustrated, including the OCCL, for estimating what amounts 
of 47 or more traits or abilities are needed by the worker to do the job. 

The job analyst then prepared a few job analyses which were brought 
before the committee for review, revision, and approval. Job analyses for 
the remainder of the jobs were prepared in accordance with the modified 
procedure growing out of the committee’s discussion. About one meeting 
for every three job analysis schedules was required. The OCCL was 
discussed in detail for each job. Final approval by the committee of the 
content of the job analyses completed this phase of the program. Each 
committee member spent approximately 64 hours during the 33 com- 
mittee meetings in study and discussion of the jobs; each may be said to 
be quite familiar with the descriptions, specifications, and characteristics 
required in the 53 jobs before the job evaluation phase of the program was 
undertaken. 

The job evaluation phase of the program began with a discussion of 
suggested method and procedure. The committee then discussed job 
factors to use for evaluating the jobs. It was agreed to evaluate the jobs 
in terms of two factors: responsibility and training and experience. 
(Commonly-used factors such as working conditions and job hazards 
were omitted since the jobs under consideration were deemed the same in 
such characteristics.) 


Definition of the Two Factors 


The Factor of Responsibility—as indicated in the job descriptions: 
1. Responsibility for employee relations; 2. Responsibility for public rela- 
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tions; 3. Responsibility for handling money; 4. Responsibility for care of 
merchandise, material, and equipment; 5. Responsibility for records 
(accuracy); 6. Responsibility for confidential information; 7. Responsi- 
bility for number of employees supervised, and type of supervision given; 
8. Responsibility for functioning in the absence of supervision; and 
9. Responsibility for store controls and store personnel relations. 

The Factor of Training and Experience—as indicated in the job specifi- 
cations: 1. Experience required; 2. Minimum training required on the job 
to reach normal production; 3. Technical or vocational training required; 
and 4. Formal education required. 

This small number of factors seemed appropriate in view of the study 
by Lawshe and Wilson* of job evaluation systems using only three or four 
factors. They say, ‘The final job rank seems to be determined by judg- 
ments on a limited number of factors, regardless of the particular type of 
procedure or the number of point scales through which the raters arrive 
at the final ratings of the job.”” Results seem to be as defensible as those 
of the more elaborate systems utilizing a greater number of factors. 

The actual evaluation was done by sorting a pack of cards, with a job 
title on each card, into ranks from high to low for one factor at a time. 
Evaluation was made first for the training and experience factor and 
then for the responsibility factor. The rating judgments were made 
completely independently by each of the five judges or raters who were 
members of the committee. There was no discussion of the jobs after 
the evaluation was begun. However, each of the raters had available to 
them the job descriptions, specifications, and OCCL’s for reference 
throughout the evaluation. 

After the ratings had been completed, the total number of points for 
each job was determined by adding the rank points for both factors of 
all five raters (i.e., 2 factors X 5 raters X 1 to 53 rank points). The pos- 
sible spread of total rank points used for the 53 jobs could range from a low 
of 10 points to a high of 530 points. The actual spread was 18 to 523 
points. 

The data were examined for reliability (Spearman-Brown, split half, 
N = 53). The estimated reliability index for the training and experience 
factor was .97; for the responsibility factor, .95. When these factors 
were combined, in terms of unit weights, the estimated reliability cor- 
relation was .97. The correlation between the average rank on the two 
factors was .96 + .01 (S.E.). This might suggest that only one factor 
would be needed since virtually all of the variance contained in the 
training and experience factor is also present in the responsibility factor. 


*C. H. Lawshe, Jr., and R. F. Wilson. Studies in job evaluation. 6. The reliability 
of two point rating systems. J. appl. Psychol., 1947, 31, 355-365. 
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Occupational Characteristics Check List Data 


The OCCL was scored in a simpler manner. Each trait or ability re- 
quired of workers had been rated by the job analyst and discussed, modi- 
fied, and agreed upon by the committee as: 


A. Very high degree of the characteristic required in some element of — 
the job. 

B. Above average amount of the characteristic required, either in num- 
erous elements of the job or in the major or most skilled element. 

C. Medium to very low degree of the characteristic required in some 
element or elements of the job. 


For “‘A”’ amount of the trait, a score of 3 was given; for “B’’, a score of 
2; and for “‘C’’, a score of 1. The summary of points obtained in this 
manner was the occupational characteristics score used in the computa- 
tions reported below. The range of scores was from 10 to 62. The 
raters had no knowledge that this score was to be derived from their 
judgments made during the committee meetings on the study and ap- 
proval of the job analyses or during the job evaluation meetings. 

The following double-entry table shows the relationship obtained be- 
tween the total evaluated points and this occupational characteristics 
check list score (Table 1). 


Table 1 


The Relationship Between Total Evaluated Points and the Occupational 
Characteristics Check List Score 


Note: r = .74 + .06 (S.E.) 
High 
5 
4 
OCCL Score 3 | 1 | 4 
2/;171]6 
1/;3/]1 


Low 1 2 83 4 5. High 
Total Evaluated Points 


The Pearson coefficient of correlation between these two variables was 
.74, +.06 (S.E.). The distributions for both variables were normalized 
into an approximate 10—-20—-40—20-10 per cent distribution, for purposes 
of presenting the data in the double-entry table, Table 1. The Pearson 
coefficient of correlation obtained between these two variables without 
normalizing the distribution was .79, +.05 (8.E.). A rather high rela- 








Job Evaluation Simplified 359 


tionship was expected since the raters had been trained in the use of the 
OCCL. The judgments were made by people familiar with a common 
body of information about the job content and requirements and specifi- 
cations of the jobs. The considerable relationship found to exist be- 
tween OCCL scores and total evaluated points is suggestive. It is sug- 
gestive of simpler and perhaps better, more valid ways of evaluating jobs. 

If scores derived from the OCCL or an improved version of it could 
eventually be used immediately for evaluation instead of evaluated points, 
the job evaluation phase of the program as conducted in this study could 
be entirely eliminated. This would have resulted in a saving of some 12 
committee hours or 60 individual hours, plus time for planning, discussion, 
and analysis performed outside the committee meetings. 

Development and evaluation by validity studies may in the future 
yield job characteristic check list systems of considerable dependability 
when used with well-trained committee members. Criteria for such 
validity studies may well take the form of median salaries earned for the 
same occupation in the community. Ifso, much uniformity in use of job 
titles as well as more adequate job descriptions and specifications through- 
out the industrial and business community is needed. 


Conclusion 


Results show that the Occupational Characteristics Check List had 
some utility in job evaluation when the members of the job evaluation 
committee were carefully trained. It provided a rough measure of the 
value of the jobs in a considerably shorter time than was required in job 
evaluating, as conducted with a simplified job evaluation system. The 
OCCL scores were found to correlate .74, +.06 (S.E.) with total evaluated 
points for a population of 53 jobs. Further development and appraisal of 
the check list system of evaluating jobs seem indicated. 


Received May 4, 1948. 
Early publication. 





Fakability of the Strong Interest Blank and the 
Kuder Preference Record * 


Howard P. Longstaff 
University of Minnesota 


Interests have long been considered one of the important factors in 
vocational adjustment. Two interest tests have become nationally prom- 
inent, the Strong Interest Blank and the Kuder Preference Record. 
Since both of these tests depend upon the subject’s statement of his likes, 
dislikes, and indifferences, a crucial question is how susceptible these 
tests are to faking. Some evidence exists that faking is possible on the 
Strong Interest Blank (1, 2, 5,7). There has been a feeling on the part 
of vocational psychologists, however, that even though some faking is 
possible on the Strong, it is probably less susceptible to malingering than 
the Kuder (4). 

The purpose of this study was to explore the fakability of both in- 
struments, and to make a comparison between them as to which was the 
more fakable. Since the tests are not scored in the same manner nor 
scores reported in the same terms, exact comparisons are difficult, but 
rough comparisons are possible. The subjects were 59 students, 24 
women and 35 men on the Strong and 22 women and 37 men on the Kuder 
in an evening Extension Division class in Vocational Development and 
Personnel Psychology at the University of Minnesota. These subjects 
were mature individuals, most of whom were employed. They are 
probably representative of the more intellectual type of person one would 
meet in the industrial employment office. In other words, they are the 
type of person who would be given psychological tests in actual industrial 
selection offices. 

The experimental procedure was as follows: The subjects first took 
the Strong Interest Blank and the Kuder Preference Record as a part of 
a battery of tests given in the laboratory part of the above mentioned 
course. They were instructed to be as frank and honest as possible as the 
results would be used to help them in evaluating their vocational choices. 
After they had taken the tests under these conditions, it was then pointed 
out to them that part of the value of a psychological test, to be used in 
selecting employees, is its imperviousness to malingering and here was a 

* This study was made possible by a grant-in-aid from the Graduate School of the 
University of Minnesota. 
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chance for them to discover how well they individually could fake the 
results on the two measuring devices as well as helping to discover the 
general fakability of these tests. 

The men’s form of the Strong blank was used for both sexes but the 
- Kuder percentiles were based on the women’s norms for the women and 
the men’s norms for the men. 

The subjects were instructed to try to lower their scores on the com- 
putational, persuasive, social service, and clerical divisions of the Kuder 
and the accountant, life insurance salesman, real estate salesman, personnel 
director, and office man divisions of the Strong blank. Similarly, they 
were to try to raise their scores on the mechanical, computational, 
scientific, artistic, literary, and musical divisions of the Kuder and car- 
penter, mathematician, engineer, physicist, chemist, artist, author- 
journalist, and musician parts of the Strong blank. The various groups 
to be faked were written on the blackboard and the direction of the faking 
indicated. Thus, the subject would look at a test item and try to 
answer it so it would boost the “‘fake-up groups” and depress the “‘fake- 
down” groups. 

This was a complicated faking procedure as all the above mentioned 
interest groups had to be faked simultaneously. Thus the subjects were 
faking some groups up at the same time they were faking others down. 
Such being the case, our results are probably not as pronounced as they 
would have been if the subjects were trying to fake only one interest 
category. It must be kept in mind that when one interest category is 
faked it automatically changes the scores on other interest categories in 
the tests. 

The data were treated as follows: The number and per cent of men and 
women who did one of the following things were computed. 


Strong 


Moved from C or C+ to B—, B or B+ (one letter group up) 

Moved from A to B+,B or B— (one letter group down) 

Moved from C or C+ to A (two letter groups up) 

Moved from A to C+ or C (two letter groups down) 

C or C+ to C or C+; B-, B or B+ to B-, B or B+ (no faking up) 
A to A; B+, B or B— to B+, B or B— (no faking down) 

A to A (could not fake up) 

C or C+ to C or C+ (could not fake down) 

Those who moved in wrong direction. 


Kuder 


Moved from 0-24 percentile to 25-74 percentile (one group up) 
Moved from 75+ percentile 1.» 74-25 percentile (one group down) 
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Moved from 0-24 percentile to 75+ percentile (two groups up) 

Moved from 75+ percentile to 0-24 percentile (two groups down) 

0-24 percentile to 0-24 percentile or 25-74 percentile to 25-74 percentile 
(no faking up) 

75+ percentile to 75+ percentile or 74-25 percentile to 74-25 percentile 
(no faking down) 

75+ to 75+ percentile (could not fake up) 

0-24 to 0-24 percentile (could not fake down) 

Those who moved in wrong direction. 


Results 


Tables 1 and 2 contain the data for the Strong and Tables 3 and 4 
portray the data for the Kuder. These data are summarized in Tables 
5, 6 (Strong), 7, 8 (Kuder), while Tables 9, 10 and 11 show the compara- 
tive data on the two tests." 

These data indicate that even under the very complex and difficult 
situation of simultaneously faking several different interest categories 
upward and downward, both the Strong and Kuder tests are vulnerable. 
Some interest categories were easier to fake than others for this group of 
subjects. Using the most rigorous criterion of faking, at least two letter 
grades (see Table 9), four interest categories, chemist, artist, author- 
journalist, and musician, on the Strong and one, artistic, on the Kuder are 
successfully faked upward by over 49 per cent of the male subjects. Two 
other categories on the Strong, engineer and physicist, are successfully 
faked upward by over a third of the male subjects. Only two categories, 
carpenter and mathematician, on the Strong and four on the Kuder, 
mechanical, scientific, literary, and musical, are faked upward by less than 
one third of the male subjects. On the fake downward categories we find 
less successful faking but even here on four out of five categories on the 
Strong, accountant, real estate salesman, personnel director, and office 
man, are faked downward two letter grades by over 20 per cent of the 
subjects and on the Kuder two categories, persuasive and social service, 
are faked downward two groups by over 60 per cent of the subjects while 
clerical and computational are successfully faked downward from high to 
low by 27 and 19 per cent respectively of the male subjects. Thus, the 
Kuder was easier for these subjects to fake downward while the Strong 
was noticeably easier to fake upward. In general, the female subjects are 

1To reduce printing costs, Tables 1 to 8 inclusive have been deposited with the 
American Documentation Institute. Order Document 2513 from American Documen- 
tation Institute, 1719 N Street, N. W., Washington 6, D. C., remitting $0.50 for micro- 


film (images 1 inch high on standard 35 mm. motion picture film) or $0.50 for photo- 
copies (6 x 8 inches) readable without optical aid. 
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less successful in faking than the men. This may be due to the generally 
known fact that women as a group seem to have less definitely structured 
occupational interests. 

When the less rigorous criterion of faking, one or two letter grades, 
was used (see Table 10), we find over 74 per cent of male subjects faking 
upward successfully on seven out of eight of the interest categories on the 
Strong and in two out of the five interest categories on the Kuder. On 
the fake downward categories all the occupational groups (herein meas- 
ured) on the Strong are successfully faked by 63 or more per cent of the 
male subjects and three out of four of the categories on the Kuder are 
faked downward by 59 or more percent of the male subjects. On this 
more lenient criterion the women make a better showing but are still 
somewhat less successful in faking than the men. 

The data we have just been discussing, those presented in Tables 9 
and 10, may be misleading due to the fact that the Strong’s Interest Blank 
and Kuder Preference Record are scored differently. Scores on the Kuder 
are reported in percentile ranks based on a randomly selected group of sub- 
jects which means that the majority of subjects who have low interest 
ratings cluster around the mean or in the group we have designed as 25 
to 74 per cent. Thus they can’t move either upward or downward more 
than one group. Scores on the Strong blank are based not upon a 
randomly selected norm group but upon specific groups of successfully 
employed men in the various occupations who have high interests in the 
interest categories for which keys are available. According to Strong, 
about 40 per cent of college students score at the C or C+ level on his 
blank, which means in a typical group of college students about half of 
them have two letter grades available to fake upward on the Strong but 
only one group on the Kuder. In our data, approximately 79 per cent 
of males and 62 per cent of females score C or C+ originally on the fake 
upward categories on the Strong as compared to approximately 37 per 
cent of males and 26 per cent of females who score below 24 percentile on 
the Kuder. In a similar fashion approximately 34 per cent of males and 
31 per cent of females scored A originally on the Strong fake downward 
categories as compared to approximately 55 per cent of males and 35 per 
cent of females who score over 75 percentile on the Kuder. Thus it is 
apparent that our rigorous criterion (Table 9) gives the Kuder an unfair 
advantage. Probably a more realisitc comparison is one based upon the 
per cent of subjects who can fake upward to A scores on the Strong and 
to 75 percentile or above on the Kuder or downward to C scores on the 
Strong or below 24 percentile on the Kuder.. These are the significant 
levels recommended by the authors of these tests and are the ones usually 
considered in vocational counseling. Table 11 presents such a compari- 
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son. Here again the outstanding fact is that both tests are fakable, the 
Strong considerably more so than the Kuder on the fake upward categories 
and the Kuder more so than the Strong on the fake downward categories 
. for the male subjects. The women are less successful in faking by this 
criterion, as they were on our other criteria. With a couple of notable 
exceptions, they are surprisingly good at faking the Kuder upward on the 
mechanical category and do much better than the men in faking the 
Kuder upward on the scientific category. 

Because the tests are fakable it does not follow that they are faked in 
general practice. Terman (6) found little faking on his masculinity- 
femininity test even when his subjects knew the purpose of the test yet 
when told to fake they were able to do so to a marked degree. The same 
is true with the Strong interest tests. Strong (7, pp. 686-687) makes the 
following statement: “The large number of correlations over .80 and 
particularly over .90 . . . are good evidence that there is remarkable 
consistency in response to interest items. A small amount of fudging 
would make such high correlations unlikely.’”’ This is the best answer 
possible to our problem. The facts show that little faking goes on in the 
guidance situation. However, what goes on in the hiring situation is an 
entirely different matter. It would seem from these data that special 
effort should be made by the examiner when using these tests to stress the 
desirability of truthful answers. Strong (7, p. 690) has given several sug- 
gestions for overcoming faking. These are: (a) emphasize speed, which 
reduces the time for thinking out faked answers; (b) view very high scores 
with suspicion because faking tends to produce abnormally high scores; 
(c) consider scores on secondary interests; (d) develop new norms where 
items obviously related to the occupation are reduced in weight or omitted 
entirely; and (e) emphasizing to the subject that his future will suffer if 
he gets into a job he dislikes. 

Brief comment on these suggestions isin order. Very high scores may 
indicate faking but it does not follow that faking will always produce 
high scores. In our data, out of 531 possible “fake upward” scores only 
60 scores were as high as a standard score of 60 or above on the Strong 
blank. In regard to the suggestion that new norms be developed, 
Steward (5) has done this but still finds the reduction in the seriousness of 
faking “not enough to recommend the keys for their value as a protection 
device.”” Steward (5) also strongly recommends that those who use the 
Strong test as part of the Steward selection battery for life insurance 
salesman should emphasize to the subject the futility of faking and 
thereby getting into a job which he may later dislike. 

Strong and Kuder might well consider the addition of a second set of 
directions for their tests headed “directions to be used in employment 
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offices.”” These special directions should emphasize speed and the fact 
that in the long run the applicant is only prolonging his vocational diffi- 
culties if he fakes the tests. In view of this not too hopeful picture two 
other suggestions for further research seem indicated. Add a set of items 
to the blanks such as those used in the Minnesota Multiphasic Personality 
Inventory (3) and other personality tests which attempt to detect a sub- 
ject who is putting himself in an unduly favorable position (the L score 
on the MMPI). The difficulty with this procedure, however, is that 
the sophisticated subjects may not “bite’’ on these items. Another 
possibility is to make an empirical study of present items in the attempt 
to locate items distorted in faking but which are not obvious even to the 
sophisticated “faker.’”” The approach might well follow the techniques 
used in developing the ‘‘K scale’’ on the MMPI. 

Faking affects the interest maturity and occupational level scores 
on the Strong in the manner one would expect. The interest maturity 
scores drop from an average score of 57 on the original test to 45 on the 
fakes. As it happens, the occupations to be faked downward are those 
which normally have high I. M. scores and those to be faked upward have 
low I. M. scores. The lowering of the I. M. score in a way validates the 
fact that multiple faking was involved and that the results are not just 
due to changing one occupation and thus causing all others to change. 
It probably also is the result of choosing the more obvious and stereotyped 
items usually associated with the occupations in question. Since we find 
such a large shift in I. M. with faking further study of this phenomena is 
indicated as a possible indicator of faking. 

The occupational level remains practically unchanged, moving from 
an average of 56 for the original to 54 on the faked, illustrating the can- 
celling effect as one shifts some occuparions upward and the others down- 
ward. 


Summary 


Considering the Kuder as a whole and the thirteen interest categories 
herein studied on the Strong: 


1. Both tests are decidedly fakable. 

2. Some interest categories are more fakable than others. 

3. Women are less successful in faking than men. 

4. The Strong test in general is easier to fake upward than the Kuder, 
while the Kuder is easier to fake downward than the Strong. 

5. It does not necessarily follow that much faking goes on in actual 
use of these tests. The potential danger is present, however. 

6. The interest maturity and occupational level scores behave as 
would be expected. Further study of the I. M. scale as an index of faking 
is indicated. 
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7. A new set of directions should probably be made for both tests in 
order to minimize faking. 

8. Further research is indicated to explore the possibility of devel- 
oping an empirical scale to detect faking. 


Received November 14, 1947. 
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Words Are Dynamite * 


R. Stafford Edwards 
Edwards and Company, Inc., Norwalk, Connecticut 


When Noah Webster’s wife caught him kissing the cook-she is sup- 
posed to have exclaimed, ‘‘Why, Noah, I’m surprised,” and he to have 
answered, “No, dear, I am surprised, you are astonished.” 

Another story has it that one wife got so annoyed at another who 
bragged about her husband being sophisticated that she looked it up in 
the dictionary . . . and had the great pleasure at their next meeting of 
agreeing that he was indeed a fakir . . . for its meaning truly is “falsely 
or fallaciously worldly-wise.” 

Such misconception of words has caused much amusement and little 
damage but imagine the tragedies that might occur if some evil-minded 
group succeeded in educating children to the misconception that “‘poison” 
meant “bread.” An erroneous use of words has been instilled into re- 
lations between employers and employees, and even those who do not 
believe there is real class hatred fan its fires by constant misuse of those 
words. 

Employers, union officers and high government officials who wouldn’t 
tolerate loose handling of dynamite near their headquarters toss explosive 
words around with utmost carelessness. 

Foremost in that class is the word “labor.’”’ Dictionaries define it 
simply as “‘mental or physical toil.” There is no definition giving it 
status as representing any group of people. In the eras when Europe had 
suppressed groups their leaders coined terms to fan their hatred of the 
idle group which did the suppressing. In France the term “Bourgeoisie,” 
for example, really defined the great middle class which is typical of our 
American population. 

There is no “landed” class or “‘nobility” in the United States. In 
fact, in France the term “‘Democrat”’ was synonymous with “Bourgeoisie” 
and the United States has grown to be commonly known as a “Democ- 
racy” for perhaps that reason . . . although the term inaccurately de- 

* The editor solicited this paper from Mr. Edwards, who is President of an organi- 
zation that has been in existence for 76 years. Its amicable relations with employees 
and high rate of production has never been interrupted. The care with which they use 
words may be in part responsible. Whether readers agree or disagree with his particular 
terminology, they will welcome having their attention directed to the dangers that lie 
in the use of emotionally charged words.—Eprror. 
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scribes our type of government which is really ‘“‘republican,”’ with gov- 
ernment by representatives of the people, not by the people themselves. 

During the era leading up to the Russian revolution the sickle and 
hammer became the symbols for the ‘‘workers’’ but you may perhaps re- 
call that at that time Americans shuddered in horror at the term ‘‘Bol- 
shevik.” It seemed to us all that was cruel, ruthless and vicious but 
really meant ‘‘Extreme Left Socialist.’””’ They are still “Bolsheviki’ and 
the newer term ‘““Communist”’ is descriptive of a form of government only. 

Because there is no real class difference in this country the term- 
inology had to revolve around those who, at the moment, were employers 
and those who were employed even though they were all workers at their 
own jobs. In the drive to sell employees the idea of unionization the 
word “labor” has been mutilated to mean everything representative of 
those who work . . . even to the point of spelling it with a capital as we 
do the proper noun “American.” 

Leaders of the union movement have further exaggerated the word 
“labor” as inclusive of all who work when, in fact, their own statistics, 
and those of the Department of Labor, show considerably less than half 
of those who work to be union members. 

Without for one moment questioning the right or the desirability of 
employees to organize into unions, it is nevertheless true that these terms 
were appropriated by leftists and extremists to create a class struggle in 
this country that would make organization easier . . . as it was in the 
countries having actual class differences. So, in place of “‘nobility” they 
wrapped everyone who employed into a class named “‘management”’ or 
“industry.”” Of course, that must include the little fellow who, until 
yesterday, was employed but who today starts his small grocery store or 
gas station. At the same moment employers developed an apoplectic 
hostility towards anyone who favored employee organization and repre- 
sentation and called him a ‘‘Bolshevik.”’ 

Along with the misuse of the word “labor” as meaning a mass of 
people instead of ‘‘phycical or mental toil’? union leaders invented a new 
catch phrase which was pretty glamorous; “labor is not a commodity.” 
Here again you have the meaning of “labor’’ distorted to represent a 
mass of people . . . and no one could disagree with the premise that 
buying and selling people would be slavery. But ‘mental and physical 
toil” is obviously rented or hired just as any other commodity and at 
managerial and professional levels as well as unskilled levels. 

Both the leftists and rightists have done a devastating job. More 
through carelessness and laziness than through agreement, their erro- 
neous terminology has been adopted by the press, those in high govern- 
ment office, the courts and, most regrettably, by conciliation agencies 
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whose obvious function is to dispel the idea that there is any real class 
struggle in our employer-employee relations problem. Every piece 
of printed matter, from newspapers to company notices on bulletin boards, 
tosses “labor and management”’ or “labor and industry” phrases into a 
class hatred fire that has little other combustible material in our form of 
life and government. Every day some radio commentator uses such 
phrases as “there appears to be a split in the ranks of labor” or “the 
forces of labor and management are squared away for a finish fight”’ . . . 
to describe what is actually nothing but a normal difference of opinion in 
the process of negotiation. 

This lurid and false picture of one mass of serfs called “‘Labor’’ being 
downtrodden by a small but selfish group called ‘‘Management”’ or “‘In- 
dustry” is further irritated by such meaningless terms as “labor relations” 
and “industrial relations.” The problem is being still further intensi- 
fied by habitual use of degrading terms such as “‘worker’’ instead of the 
more dignified term ‘employee.’ 

If the word “‘labor’’ need be used at all it should be confined to its 
correct meaning, to designate the work done. Everyone who works is an 
“employee” of someone or some organization. Our problem is to further 
good relations between that employee and his employer. The mass of 
people who are employed has no better designation than “employees” 
and the mass of those who employ, “employers.’”’ The dividing line be- 
tween them is indeed small and subject to quick change from one side to 
the other. Is it not the ambition of most employees to become an em- 
ployer? 

There is true distinction between the large majority of employees 
who do not care to belong to a union and those who do; the only correct 
terminology for the latter is ‘‘unionized employees” and consequently the 
mass terminology is simply “‘union” or “unions” . . . not “Labor.” 

Employers can accomplish much to promote better employer-employee 
relations by sticking closely to true terminology in all notices and press 
releases, and by using calmer, more dignified terms such as ‘‘the company” 
instead of “the management” and “employee” instead of ‘worker.’ 

There are plenty of fundamental, non-inflammatory facts to justify 
and recommend the theory of unionization without resort to instituting 
@ spurious class struggle in this nation, born without it. Union leaders 
and employers could weil work together to find less inflammatory phrases 
in their written and printed interchanges than “demands” . . . why not 
‘proposals’? Why should any contract deal at length with such phrases 
as “grievance procedure” and “Grievance Committee,” etc.? A griev- 
vance is a “wrong.”” Why not “Determination Procedure” and “Deter- 
mination Committee’? 
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In conclusion it might be pointed out that these recommendations 
for a more intelligent use of words in dealing with this problem have the 
firm foundation of all good psychological practice... . bring the truth 
to the surface and deal with it instead of being led into a devastating 
mirage of misconception and untruth by use of emotionally toned words. 
Such words are dynamite. 


Received May 21, 1948. 
Early publication. 





A Technique for the Construction of Attitude Scales 


Allen L. Edwards and Franklin P. Kilpatrick 
The University of Washington 


Earlier articles (3, 6) have reviewed the various methods which have 
been used in the construction of attitude scales: the method of equal ap- 
pearing intervals developed by Thurstone (16), the method of summated 
ratings developed by Likert (14), and the method of scale analysis devel- 
oped by Guttman (9). The method of equal appearing intervals and the 
method of summated ratings are similar in that both provide techniques 
for selecting from an initial large number of items, a set of items which 
constitutes the measuring instrument. Scale analysis differs from these 
two methods in that it is concerned with the evaluation of a set of items, 
after the items have been selected in some fashion or another. 

In the method of equal appearing intervals, items of opinion are sorted 
by a judging group into 9 or 11 categeries constituting a continuum rang- 
ing from unfavorable to favorable. The scale value of each item is found 
by locating the point on the continuum above which and below which 50 
per cent of the judges place the item. The spread of the judges’ rating is 
measured by Q, the interquartile range. A high Q value for an item indi- 
cates that the judges are in disagreement as to the location of the item on 
the continuum and this, in turn, is taken to mean that the item is ambigu- 
ous. Both Q and scale values are used in selecting items for the attitude 
test. Approximately 20 items with scale values equally spaced along 
the continuum and with low Q values are selected for the test. Scores 
on the test are determined by finding the median of the scale values of 
the items with which a subject agrees. 

In the method of summated ratings, items are selected by a criterion 
of internal consistency. Subjects check whether they strongly agree, 
agree, are undecided, disagree, or strongly disagree with each item. 
Numerical weights are assigned to these categories of response using the 
successive integers from 0 to 4, the highest weight being consistently 
assigned to the category which would indicate the most favorable attitude. 
A high and low group are selected in terms of total scores based upon 
the sum of the item weights. The responses of these two groups are then 
compared on the individual items and the 20 or so most discriminating 
items are selected for the attitude test. A subject’s score on this test is 
determined by summing the weights assigned to his responses to the 20 
items. 
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In scale analysis, a complete set of items is tested to determine whether 
they, as a group, constitute a scale in the sense that from the rank order 
score it is possible to reproduce a subject’s response to the individual 
items. The degree to which this is possible is expressed by a coefficient 
of reproducibility. Although ordinarily Guttman uses 10 to 12 items, 
to give a simple explanation of this coefficient let us suppose that we have 
but 3 items, each with but 2 categories of response, agree and disagree. 
We shall assume that the agree response, in each instance, represents a 
favorable attitude and the disagree response an unfavorable attitude. 
A weight of 0 is assigned to the disagree response and a weight of 1 is as- 
signed to the agree response. Let us also suppose that for the first item 
we have in our sample 10 subjects with weights of 1 and 90 with weights 
of 0; for the second item we have 20 subjects with weights of 1 and 80 with 
weights of 0; and for the third item we have 40 with weights of 1 and 60 
with weights of 0. 

In the ease of perfect reproducibility, the 10 subjects with weights of 
1 on the first item will be the 10 subjects with the highest rank order 
scores. These 10 subjects will also be included in the 20 who have weights 
of 1 on the second item and these 20, in turn, will be included in the 40 
who have weights of 1 on the thirditem. It would also be true that only 
4 patterns of item response would occur, if the set of items were perfectly 
reproducible. For the sample at hand, these patterns and the scores 
associated with them would be: AAA-3; DAA-2; DDA-1; DDD-0O. 
Since all responses could be perfectly predicted from the scores, the coeffi- 
cient of reproducibility, in this instance, would be 100 per cent. Perfect 
reproducibility is seldom found, however, and in practice a coefficient of 
85 per cent or higher is believed satisfactory for judging a set of items to 
be a scale.2 Various techniques for computing the coefficient of repro- 
ducibility have been developed and are described in the articles by 
Festinger (7), Clark and Kreidt (2), and Guttman (11, 12). 

Scale analysis, in the sense mentioned above, thus becomes a tech- 
nique secondary to the problem of item selection.* The important prob- 

! This statistic is explained in the articles by Clark and Kreidt (2), Edwards and 
Kilpatrick (6), Festinger (7), and Guttman (9, 11, 12). 

* There are other criteria to be applied in determining whether a set of items cop- 
stitutes a scale in addition to the coefficient of reproducibility (10,12). Little has been 
published, however, in which these criteria have been applied empirically to a concrete 
set of data. This may be remedied in the forthcoming volumes by Guttman and his 
associates which are to be published by the Social Science Research Council. So far, 
however, the application of these criteria has simply been mentioned along with the 
theoretical and practical implications. The coefficient of reproducibility has been 
stressed in all of Guttman’s publications, perhaps for the reason that it is considered a 
primary and necessary condition but an insufficient condition for a scale. 

* This is not to deny the importance of the theory underlying scale analysis. 
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lem is to obtain a set of items which the investigator may have some as- 
surance will scale when a particular technique of testing for scalability is 
applied. Up to the present time, the problem of item selectiou in scale 
analysis seems to have been left largely to the intuition and experience of 
the investigator. The only practical rules suggested are that one should 
simply rephrase the same question in slightly different ways (7, p. 159) 
or that one should look for items with as homogeneous content as possible 
(12, p. 461). This latter suggestion indicates that if we are interested 
in the problem of attitude toward the Negro, we should break this universe 
of content down into sub-universes constituting perhaps such areas as 
attitude toward the Negro in public eating places, attitude toward the 
Negro as a resident in the community, attitude toward the Negro as a 
voter, attitude toward the Negro as an employer, attitude toward the 
Negro in public conveyances, and so on. But even here, we find that 
attitude toward the Negro, let us say, in public conveyances can be 
broken down into areas of content even more homogeneous by enumera- 
ting the specific conveyances: streetcars, busses, trains, planes, and so on. 
Each of these areas of content might possibly be broken down into still 
more homogeneous areas. Eventually, we may end up, as Festinger sug- 
gests, with multiple rephrasings of the same question, and our two rules 
are thus but one (7, p. 159). 

Obviously, any technique which enables us to select a set of items 
from the large number of possible items, with some assurance that the set _ 
of items selected will, in turn, meet the requirements of scale analysis 
would be of great value. In this paper, a technique which has proved 
successful in doing this is described. For reasons which will become clear 
as we proceed, we have called this technique the scale-discrimination 
method of attitude scale construction (5). 


The Scale-Discrimination Technique 


The scale-discrimination method is based upon preliminary investiga- 
tions which showed that the cutting point‘ of an item is related to the 
Thurstone scale value of the item and that the reproducibility® of an item 
is related to the discriminatory power of the item (6). The discrimina- 
tory power of an item, it has also been shown, is not, as might seem at 
first glance, merely a function of the item’s scale value. It can easily be 
demonstrated that items with comparable Thurstone scale and Q values 


‘ The cutting point of an item marks the place in the rank order scores of the subjects 
where the most common response shifts from one category (agree) to the next (disagree). 
Between cutting points, in a perfect scale, all responses would fall in the same category. 

‘ The reproducibility of an item is measured by degree to which responses to the 
item can be reproduced from the rank order scores of the subjects. 
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may differ tremendously in their power to differentiate between those 
with favorable and those with unfavorable attitudes.*® 

Statements of opinion concerning science were collected from a variety 
of sources. Books and essays were consulted. Individuals were asked 
to express their opinions in brief written statements. We eventually 
collected 266 statements of opinion about science. In editing these items, 
particular attention was paid to eliminating those items which: (1) were 
liable to be endorsed by individuals with opposed attitudes; (2) were 
factual or could be interpreted as such; (3) were obviously irrelevant to 
the issue under consideration; (4) appeared likely to be endorsed by every- 
one or by no one; (5) seemed to be subject to varying interpretations for 
any reason; (6) contained a word or words not common to the vocabularies 
of college students. Also, due to emphasis upon the matter during both 
the collecting and editing-of the statements, most of the 155 statements 
finally selected expressed a clear-cut favorable or unfavorable opinion 
about science. 

Thirteen other items, which might be called ‘“‘control” items, were 
added to the original 155. These 13 items were added to determine how 
they would fare at various stages of the scale-discrimination method. 
Of the 13 items, we judged that 7 were “‘neutral’’ items in the Thurstone 
sense; 2 were items which could possibly be interpreted as factual; 1 was 
believed to be too extreme for many endorsements; 1 was judged am- 
biguous because the words “scientific holiday” could be interpreted as 
meaning a moratorium or as meaning a celebration; 1 was judged ambig- 
uous because more than one dimension was involved; and 1 was judged 
irrelevant. Thus there were 168 items in all which were used in test- 
ing the scale-discrimination method of scale construction.’ 


Determining Scale and Q Values of the Items 


Envelopes numbered 1 througl 110 were prepared. In each envelope 
we placed a set of 3 X 5 cards lettered A, B, C, D, E, F, G, H, I, and a 
pack of slips of paper approximately 2 X 4 inches in size. On each slip 
of paper one of the 168 items was printed along with the number of the 
item. In each case the pack of slips was shuffled so that the items would 
be arranged in no set order. The envelopes were given to an elementary 
psychology class along with a set of instructions describing the Thurstone 

*For example, the extreme item: “All Republicans should be executed” would 
undoubtedly show a scale value at one extreme of the continuum and a definitely low 
Q value. But this item will not differentiate between those with favorable and un- 
favorable attitudes toward Republicans for the obvious reason that both groups would 
probably react in the same fashion to the item. 


7 It should be emphasized that the inclusion of the “control’”’ items mentioned is not 
to be considered part of the scale-discrimination procedure. 
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sorting procedure and the members of the class were asked to sort the 
items in accordance with the instructions. 

The item sortings of each subject were examined and we discarded 
those subjects whose sortings showed obvious reversals of the continuum 
or failure to carry out instructions. On this basis we were left with 82 
completed sets of judgments. 

Frequencies of judgments in each of the 9 categories for each item were 
tabulated, translated into cumulative frequencies, and then into cumula- 
tive proportions.* An ogive was plotted for each item with cumulative 
proportions on the axis of ordinates and scale values on the axis of ab- 
scissas. Scale values were read to two decimal places (the second decimal 
place being merely an approximation) by dropping a perpendicular to 
the baseline of scale values at the point where the cumulative proportion 
curve crossed the 50 per cent mark. In a similar fashion Q values were 
determined by dropping perpendiculars at the 25th and 75th per cent 
levels, Q being the scale distance between these two points or the inter- 
quartile range.°® 

The 168 items were then plotted in a bivariate distribution according 
to scale and Q values, the scale values being plotted on the baseline. 
The distribution of scale values was bimodal in shape There were very 
few items in the “neutral’’ section (none at all in between 5.0 and 5.9), 
the modal categories being 1.0 to 1.9 and 7.0 to 7.9. The Q values of the 
7 items which did fall in the “neutral’’ scale interval (4.0 to 4.9) were 
quite low, 6 of the 7 falling well below the median Q value for all 168 
items. All 7 of these items were “control” items, described previously. 

A line was drawn through the distribution at approximately the 
median Q value of all the items, 1.29. All items with Q values above 
this point were rejected. We worked from here on with the remaining 
83 items or with approximately the 50 per cent of the initial set of items 
with the least degree of ambiguity as measured by Q. One of the “‘neu- 
tral” control items was eliminated by this standard and 6 were ac- 
ceptable. These 6 items all had scale values between 4.0 and 4.9. No 
items at all had been found in the scale interval 5.0 to 5.9 and the Q 
criterion eliminated all items in the interval 3.0 to 3.9. One of the 2 


* This task was most laborious. Almost 14,000 slips of paper had to be sorted and 
then tabulated. Some judging technique similar to that used by Ballin and Farnsworth 
(1) or Seashore and Hevner (15) would reduce much of this labor, but even here the 
task is not simple. Various methods which simplify the judging process are now being 
tried and will be reported upon in another paper. 

* This operation was simplified by setting up a master chart with the cumulative 
proportions on the Y axis and the scale values on the X axis. This chart was then 
taped to a ground-glass plate which fitted over an enclosed wooden box containing a 
100 watt bulb. Tracing paper could then be placed over this chart and the ogives for 
the individual items quickly drawn. 
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factual items was rejected by the Q criterion and the ambiguous item 
with the words “scientific holiday” was also eliminated. The remaining 
10 “control” items would have to be judged acceptable by the Q criterion. 


Item Analysis 


The 83 items were prepared in a form suitable for Likert type re- 
actions. Each item was followed by a 6 point forcing scale (strongly 
agree, agree, mildly agree, mildly disagree, disagree, strongly disagree). 
Subjects were instructed to check for each item the one expression which 
most nearly described their own attitude with respect to the item. In 
all, 355 subjects filled out the questionnaire: 245 from sociology, psy- 
chology, and speech classes at the University of Washington; 60 from a 
local junior college; and 50 from a police school. Of these 355 papers, 
346 were usable, 9 of them being incomplete or having more than one 
answer for a single item. 

Scoring was done in the usual Likert fashion, weights of 0 through 5 
being assigned to the 6 response categories, the weight of 5 being given to 
the strongly agree response in the case of items expressing a favorable 
opinion about science, and to the strongly disagree response in the case 
of items expressing an unfavorable opinion about science. For the 6 
items in the scale interval, 4.0 to 4.9, the direction of the weights was 
assigned on the basis of whether the scale value of the item was larger or 
smaller than 4.5. Response weights on the 83 individual items were 
summated for each subject and a frequency distribution plotted for the 
resulting scores. The obtained range of scores was only 64 per cent of the 
possible range (140-405 obtained, 0-415 possible) with considerable 
bunching at the upper (favorable) end of the distribution. 

Two criterion groups were chosen, approximately the upper and lower 
27 per cent, in terms of total scores. The range of scores for the lower 
94 papers was from 140 to 300 and the upper 94 papers had scores ranging 
from 343 to 405. The 83 items were then subjected to item analysis. 
For each item, frequencies in each of the response categories for the high 
and for the low group were tabulated. The 6 categories were then re- 
duced to 2 by combining categories 0, 1, 2, 3, and 4.!° From the re- 
sulting 2 X 2 tables, phi coefficients were calculated. The phi coeffi- 
cients ranged in size from .16 to .78. 


%” This grouping was necessary because our subjects gave predominantly favorable 
responses to the items. If our universe of content had been attitude toward labor 
unions, we would expect a more symmetrical distribution of responses and consequently 
a different grouping of categories. 

4“ The nomographs by Guilford (8) or the tables prepared by Jurgensen (13) make 
these calculations quite simple. 
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Next the 83 items were plotted in a bivariate distribution with phi 
values on the Y axis and scale values on the X axis.“ The 4 items with 
the highest phi coefficients were selected from each half-scale interval; 
due to the previously mentioned gaps in the scale continuum, this in- 
volved only the intervals from .5 to 2.5 and from 6.5 to 8.0. No items 
were selected from the “neutral” control items ih the scale interval 4.0 
to 4.9. The 28 items thus selected were assigned to Forms A and B of 
the questionnaires by alternating scale values between the two forms. 

The final scales then consisted of 14 items each, with the items very 
closely equated as to Thurstone scale values, Q values, and phi values. 
For Forms A and B, respectively, the mean scale values of the 14 items 
were 3.85 and 3.91; the mean Q values were .90 and .92. Phi coefficients 
of the items in Form A ranged from .58 to .78 with a median value of 
.65; for Form B they ranged from .58 to .76 with a median value of. .66. 
Only 1 of the remaining 10 ‘‘control’’ items had a phi value above .58. 
This was one of the 6 “neutral” items and it had a phi value of .61. The 
other “control” items would be rejected by the phi criterion. 


Reliability and Reproducibility of the Scale 


The reliability coefficient of the two forms of the scale, 14 items versus 
14 items, based upon the responses of 248 new subjects was .81, uncor- 
rected. For both forms of the test the range of scores was quite re- 
stricted, 30 to 70 in each case with possible ranges from 0 to 70. Within 
this restricted range, bunching at the upper, or favorable, end was present. 
The mean score for Form A was 58.22 and the standard deviation was 
7.33. For Form B the mean was 57.20 and the standard deviation 
was 7.79. 

Scale analysis based upon the performance of a sample of 87 subjects 
drawn from the larger group of 248 subjects was carried out with both 
forms of the test by the Cornell technique (11). A coefficient of repro- 
ducibility of 87.5 per cent was obtained for Form A and a coefficient of 
reproducibility of 87.2 per cent was obtained for Form B. Response 
categories in each instance were dichotomized. Cutting points were 
established and we observed Guttman’s rule that “no category should 
have more error than non-error” (11, p. 17). The range of modal re- 
sponse categories was from .51 to .82 for Form A. The mean value of 


12 A plot of phi values against Q values indicated no discernible relationship, the 
variability within columns being approximately the same as the total variability. This 
would indicate that in the procedure followed here, the scale-discrimination procedure, 
the phi analysis adds to the process of item selection when items with comparable Q 
values are used. We have, it may be recalled, already eliminated the 50 per cent of the 
items with the highest Q values. ‘ The relationship between the discriminatory power 
of an item and Q value when this is not the case is described in another paper (4). 
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the modal categories, .57, which is the minimum value" of the coefficient of 
reproducibility for this set of items with the sample at hand, may be 
compared with the observed coefficient of reproducibility of 87.5 per cent. 
For Form B the range of the modal categories was from .52 to .67. The 
mean value, which again is the lower limit of the coefficient of reproduci- 
bility, was .57, whereas the observed value of the coefficient of reproduci- 
bility was 87.2 per cent. 

The two observed values of the coefficient of reproducibility are suf- 
ficiently high to constitute evidence that but a single dominant variable 
is involved in the sets of items or that, in other words, uni-dimensionality 
is present. Such sets of items are said to be scalable or to constitute a 
scale. The coefficients of reproducibility also mean that it is possible to 
reproduce item responses from rank order scores with the accuracy indi- 
cated by the value of the coefficients. 

The error of reproducibility which is present is simply 1.00 minus the 
observed coefficient of reproducibility. If the error of reproducibility can 
be assumed to be random, then these sets of items possess an important 
property: the simple correlation between rank order scores and an ex- 
ternal criterion will be equal to the multiple correlation between the items 
and the external criterion (10). This, in turn, means that efficiency of 
prediction is maximized by the simple correlation. 

It would also be true in the case of sets of items which meet the criteria 
demanded of scales'* that the interpretation of the rank order scores is 
unambiguous and that it is possible to make meaningful statements about 
one subject being higher (more favorable) than another on the variable 
in question.“ This would not be true of a test involving more than one 
variable. Suppose, for example, a test involves two variables. Then a 
subject might obtain a given score by being high on one variable and low 
on the other. Another subject might obtain the same score by being high 
on the second variable and low on the first. From the rank order scores 
alone it would be impossible to tell the relative positions of the subjects 
on the twu variables, and the interpretation of the composite score is 
ambiguous. Statements of “higher and lower than’ might be made, but 
we would not know what the “higher and lower than” referred to, 
for by increasing or decreasing the number of items related to either 


% This is the lower limit because the reproducibility of any single item cannot be 
less than the frequency in the modal category. The method of computing the minimum 
value of the coefficient assumes independence of the items. See Guttman (12). 

4 See footnote 2. 

% In the case of perfect scales, where the coefficient of reproducibility is unity, it 
also follows that an individual with a low rank order score will not have given a more 
favorable response to any item than any person with a higher rank order score. 
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variable, the rank order scores of the subjects could be altered.'* This 
would not be true of a test in which the items all belong on a single 
continuum, that is, 9 test which is uni-dimensional. In such a test, in- 
creasing the number of items would not shift the rank order scores of the 
subjects. 


Summary 


The method of scale construction described in this paper has been 
called the scale-discrimination method because it makes use of Thur- 
stone’s scaling procedure and retains Likert’s procedure for evaluating the 
discriminatory power of the individual items. Furthermore, the items 
selected by the scale-discrimination method have been shown, in the case 
described, to yield satisfactory coefficients of reproducibility and to meet 
the requirements of Guttman’s scale analysis. The scale-discrimination 
method is essentially a synthesis of the methods of item evaluation of 
Thurstone, Likert, and Guttman. It also possesses certain advantages 
which are not present in any of these methods considered separately. 

The scale-discrimination method, for example, eliminates the least 
discriminating items in a large sample, which Thurstone’s method alone 
fails to do. The unsolved problem in the Thurstone procedure is to 
select from within each scale interval the most discriminating items. 
Items within any one scale interval may show a high degree of variability 
with respect to a measure of discrimination. For example, we found 
within a single interval items with phi values ranging from .24 to .78. 
That Thurstone’s criterion of Q does not aid materially in the matter of 
selecting discriminating items is indicated by the plot of phi values against 
Q values, after the 50 per cent of the items with the highest Q values had 
already been rejected. Under this condition, items with Q values from 
1.00 to 1.09 had phi coefficients ranging from .32 to .76. Thurstone’s 
method also, by the inclusion of “‘neutral”’ items, tends to lower reliability 
and to decrease reproducibility of the set of items finally selected (6). 

Thus when selecting items by Thurstone’s technique alone, we have 
no basis for making a choice between items with comparable scale and 
Q values, and yet these items are not equally valuable in the measurement 
of attitude. By having available some measure of the discriminatory 
power of the items, the choice becomes objective as well as advantageous 
as far as the scale itself is concerned."” 


18 We do not mean to imply by this discussion that multi-dimensional scales are 
without value. 

17 Additional research may indicate that the Thurstone scaling procedure is not 
necessary. See, however, the articles by Edwards and Kilpatrick (6) and Clark and 
Kreidt (2). 
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The advantage of the scale-discrimination method over the Guttman 
procedure lies essentially in the fact that we have provided an objective 
basis for the selection of a set of items which are then tested for scalability. 
It may happen that not always will the scale-discrimination method 
yield a set of items with a satisfactory coefficient of reproducibility. But 
this is not an objection to the technique any more than the fact that not 
always will a set of intuitively selected items scale. Rather, it seems that 
the scale-discrimination method offers greater assurance of scalability 
than any intuitive technique such as applied by Guttman. Furthermore, 
the set of items selected by the scale-discrimination technique provides 
a wider range of content than do the intuitive Guttman items. In the 
scale-discrimination method, we obtain items which are not essentially 
multiple phrasings of the same question as is often true when the selection 
of a set of items to be tested for scalability is left to the experience of the 
investigator (7, p. 159). 

Several different areas of content are now being studied by variations 
of the scale-discrimination method and the results of these researches 
should provide additional evidence concerning the realtionship between 
the scale-discrimination method and scale analysis. 


Received January 2, 1948. 
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A College Achiever and Non-Achiever Scale for the 
Minnesota Multiphasic Personality Inventory 


William D. Altus 
University of California, Santa Barbara College 


While working with illiterate soldiers as a personnel consultant during 
the late war, the writer found that an adjustment test, devised for other 
purposes, had considerable validity for predicting ‘‘academic” achieve- 
ment, if what these men of low intellectual caliber learned to do in terms 
of literacy may be called “academic” (1). The adjustment of these 
unlettered men, as determined by a 36-point, orally-administered test, was 
just as important as their intelligence, as determined by the Army 
Wechsler, in differentiating between those soldiers who would graduate 
and those who would fail and be sent home. This finding is noteworthy 
only because it is at variance with previous studies, of which Bell’s (2) 
may be considered as typical, where the correlations obtained between 
adjustment tests and grade averages did not deviate significantly from 
zero. The present study is one more attempt to find some significant 
relationships between the way college students respond to adjustment 
items and the type of grade average which they earn, intelligence being 
held constant. 

The method of equated groups was used, the basis of the equating 
being the standard scores earned on the Altus Measure of Verbal Aptitude. 
This measure of aptitude was considered adequate for the purpose since 
it had given a validity coefficient previously of .64 with elementary 
psychology grades. The population from which the two groups was 
drawn consisted of two classes in elementary psychology at the Santa 
Barbara College, University of California, during the spring semester of 
1947. The average standard score on the first two semester tests in 
psychology was used as a measure of academic achievement. Students 
were accepted for pairing if they met the following criteria: (1) If they 
had a score within two standard score points of each other on the Measure 
of Verbal Aptitude; (2) if one of the pair had an average standard score 
on the first two psychology tests at least one-half sigma above his standard 
score on the Aptitude Test; (3) and if the other of the pair had a com- 
parable score at least one-half sigma below his measured aptitude. 

Table 1 shows clearly that the two groups of 25 students each were 
well equated on the basis of general aptitude. From the aptitude sigma 
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Table 1 
Certain Statistical Data on the Two Equated Groups 
Achievers Non-Achievers 
Measure Mean Sigma Mean Sigma C.R. 
Aptitude 51.48* 6.85 51.52* 6.77 _ 
Psych. Gradet 60.30* 4.60 41.86* 5.38 12.99 
H.P. Ratio** 1.99 .57 1.16 32 6.38 





* Standard scores, mean of 50, sigma of 10 for the total population. 

t Psychology grade: Final average standard score in elementary psychology, based 
on three semester tests and one final examination. 

** Honor point ratio, in which A = 3, B = 2,C = 1,D =0,and F = —1. Ratio 
represents all college work taken. 


it will be noted that the two groups are less variable in aptitude than the 
population from which they came, i.e., the two classes in elementary 
psychology, since the sigma is considerably smaller than ten. This dif- 
ference is simply due to the fact that it is impossible to vary too far below 
an already low aptitude in terms of grade achieved; similarly those quite 
high in aptitude could not meet the criterion of being one-half sigma 
above their tested aptitude. For these reasons it was impossible to in- 
clude the relatively quite dull or the very bright in the study. 

It will be noted from Table 1 that the two groups were almost two 
sigmas apart in average psychology grade. Although of the same general 
aptitude, one group earned an average grade of B in elementary psy- 
chology and the other group fell at the dividing point between a C and a 
D. The critical ratio of this difference in average psychology grade is 
12.99. The average grade earned in all subjects taken in college does 
not show quite the same divergence for the two groups, 1.99 and 1.16. 
One group, here called the achievers, was earning an approximate B 
average in college while the other, the non-achieving group, was earning 
only slightly better than the required C average. The critical ratio of the 
differences in average college grades is, however, 6.38, showing that a 
statistically significant difference did exist. 

In the present study the only factor held constant was general apti- 
tude. It would have been desirable, perhaps, to hold sex and age con- 
stant but the population did not afford a sufficient pool of students for 
any further matching. In the group working above capacity—the 
“Achievers” —there were 22 men and 3 women; for those working below 
their capacity—the ‘‘Non-Achievers’’—there were 16 men and 9 women. 
It is rather striking that three times as many women were in the non- 
achieving group as there were in the achieving group. It is probable that 
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the competition engendered by the G. I. Bill and the relatively greater 
maturity of the male students, who average 4.5 years older, are responsible 
for the sex differences between the two small groups here studied. 

The 50 members of the two groups were administered the group form 
of the Minnesota Multiphasic Personality Inventory, hereafter called the 
MMPI. It was felt that the wide range of scales and individual items 
afforded by this test might reveal some significant differences between 
the two groups, these differences being in some way associated with non- 
intellective factors of etiological significance in grade getting. The dif- 
ference in the average scores of the two groups on the various MMPI 
scales are presented in Table 2. It will be seen that the four non- 


Table 2 


Mean Differences Between the Achieving and Non-Achieving Groups 
on the Various Scales of the MMPI 

















Non- Non- 
Achievers Achievers Achievers Achievers 

Scale Mean Mean Scale Mean Mean 

? 50.00 50.00 Pd 58.00 58.80 

L 51.84 53.64 Mf 56.56 53.84 

K 59.16 59.08 Pa 53.48 55.40 

F 52.60 53.56 Pt 56.24 57.56 

Hs 52.76 53.20 Se 58.32 60.92 

D 51.20 54.32 Ma 54.28 61.18 
Hy 58.04 59.40 





clinical scales, ?, L, K and F, show mean scores that are quite alike. 
Three of these non-clinical scales, ?, L and F, are very close to the norma- 
tive standard score of 50. The K scale, however, while it does not show 
any difference between the two groups in mean scores, does show a rather 
marked elevation in average score for both groups, some nine-tenths of a 
standard deviation above the MMPI norms. The usual interpretation 
of an elevated K scale is that it represents a defensive test-taking at- 
titude, either conscious or unconscious. Rather interestingly, however, 
Meehl and Hathaway (4) have shown that this scale is to a certain degree 
correlated with educational level, college students tending, on the average, 
to make higher mean scores than is true of the entire population. Part 
of the exaggeration in mean K score for the two groups is doubtless due 
to this educational factor. It may also be possible that motivational 
factors operating in markedly over- and under-stimulated college stu- 
dents—in terms of scholastic achievement—are associated in some degree 
with defensive test-taking attitudes such as would be reflected in an 
elevated K score. Until norms for stratified samples of college students 
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are available on the MMPI, it will be fruitless to carry conjecture any 
further concerning the etiology of such anomalies as here appear on the 
K scale. 

The most noteworthy finding in Table 2 is the marked similarity be- 
tween the two groups in mean score on the various clinical scales. Some 
of the scores are somewhat elevated, as was the K score, but when the 
score for one group is elevated, it tends to be elevated to about the same 
degree for the other group. The trend for eight of the clinical scales is in 
the expected direction—that is, for greater maladjustment on the part 
of the non-achieving group. Only for the Mf scale is the direction 
reversed; here the difference may be somewhat suspect because of the 
disparity between the number of men and women composing the two 
groups. Relatively high means for the two groups may be noted on at 
least three of the clinical scales, Hy, Pd and Sc. The height of two of 
these three scales, Sc and Pd, may be in part accounted for by the rela- 
tively high scores on the K scales, since the size of K determines the size 
of the correction to be applied to Sc and Pd; thus these two scales are in 
part a function of K. The implicit cautiousness apparent in the height- 
ened K scale average is not reflected, though, in the otherwise rather high 
Hy averages, 58.04 and 59.40, since K is not used as a correction on this 
clinical scale. One explanation for the elevated means of the two groups 
on the Hy scale is again the educational artifact noted in the preceding 
paragraph in the discussion of the K scale—college students tend generally 
to earn higher scores on the Hy scale than is true of the general popula- 
tion. Whatever the causes or causes for the deviations of the two groups 
from the normative score of 50, however, it will be noted that their 
averages remain quite safely within the bounds of statistical normality. 
And since neither the normality nor the abnormality, per se, of the two 
groups here studied is the purpose of the present investigation, the 
etiology of the mean score deviations will be discussed no further. 

On one of the MMPI scales there is a significant difference between 
the groups in mean score. On the Ma (Hypomania) scale there are 6.9 
standard points difference in the means of the groups. The difference 
converts into a critical ratio of 2.96, significant at the .01 level. None 
of the other differences on the remaining eight clinical scales is significant 
even at the .05 level. When it is remembered that, excepting for the Mf 
scale, all of the differences are in favor of the greater maladjustment of 
the non-achieving students, it may be assumed, despite the lack of statis- 
tical certainty, that probably a true difference, even though slight, may 
obtain between the two groups. Insofar as the Ma scale possesses 
validity, the data would seem to suggest that the overactive, restless, try- 
too-many-things type of person is a somewhat poorer student, on the aver- 
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age, than is the better controlled, less active fellow student. Whether 
there is any connection between this finding and that of Harris (3) who 
discovered that, among other scales, the Ma differentiated between neuro- 
tics who were amenable to brief psychotherapy and those who were not, 
appears unlikely. It may be of interest on the anecdotal level, if no 
other, that one of the non-achievers who is in the top twelve per cent of 
local college students in verbal intelligence received an Army discharge 
for psychoneurosis. After five years of successful service as a non-com 
in the Regular Army, he was cashiered from an OCS for striking an 
officer, went into manic state for a week, for which amnesia is still total, 
spent six months in a neuropsychiatric ward before his discharge. His 
score on the Ma scale of the MMPI is the highest of all 50 students and 
is higher than for any other scale on his own profile. Whether he was the 
only neuropsychiatric casualty among the veterans in the two groups 
studied is unknown; the background data in his case came to light only 
because he was worried about his MMPI profile when it was explained to 
him. He then vouchsafed his personal history. 

In terms of individual scores on the Ma scale, 13 of the non-achieving 
group had scores of 60 or higher while the corresponding number for the 
achieving group was 6. Four of the non-achieving group had scores of 
70 or higher on this scale; only one of the achieving group reached 70. 
If these data are representative of college students as a whole, it may be 
assumed that the chances of any individual student with an Ma score of 
60 or higher working up to capacity in academic subjects are two to one 
against. The number of students in the present study is so small and the 
possibility of a biased sampling so great that even so tentative an actua- 
rial assumption must be buttressed by further research before it can be 
accepted. 

A further test was taken in order to determine whether certain indi- 
vidual items among the 567 in the MMPI could be isolated by item 
analysis and fused together to form a partial measure of the non-intel- 
lective factors in grade-getting. A tabulation was made of all “Yes” 
answers to each of the items in the MMPI for the two groups. Since 
there were only 25 in each group, it was felt that a difference of 5 on any 
one item was great enough to justify the inclusion of the item in a new 
scale. The 60 items which showed a difference of 5 or more “Yes” 
answers are shown in Table 3. 

It will be noted from Table 3 that the non-achiever answered “Yes” 
more frequently to the first 42 items while the achievers answered ‘‘Yes’’ 
more often to the last 18 items. The whole range of 60 items represents a 
grabbag of symptoms which accord with no clearly defined syndrome. 
However, the feminine cast of the non-achievers—perhaps as a result of 
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Table 3 


Items in the Group MMPI Which Discriminated Between the Achieving and 
Non-Achieving Groups by Five or More Points 











‘ *Yes” 
Answers Item 





18-llt ‘*1. I do not mind being made fun of. 
104 *2. I like collecting flowers or growing house plants. 
13-7 *3. When I was a child I belonged to a crowd or gang that tried to stick 
together through thick and thin. 
18-10 *4. I like to flirt. 
17-8 *5. I have very few fears compared to my friends. 
10-5 *6. Sometimes I become so excited I find it hard to go to sleep. 
8-3 *7. Sometimes some unimportant thought will run through my mind and 
bother me for days. 
10-5 *8. I wish I could get over worrying about things I have said that may have 
injured other people’s feelings. 
9-2 *9. I like to keep people guessing what I’m going to do next. 
14-8 *10. At times I have worn myself out by undertaking too much. 
12-7 *11. I dream frequently. 
7-1 *12. Usually I would prefer to work with women. 
18-12 *13. I can remember playing sick to get out of something. 
23-14 *14. While in trains, buses, etc., I often talk to strangers. 
7-2 *15. I have a daydream about life which I do not tell to other people. 
22-15 *16. I usually work things out for myself rather than get someone to show 
me how. 
19-14 *17. I strongly defend my own opinions as a rule. 
13-5 *18. A large number of people are guilty of bad sexual conduct. 
20-12 *19. The one to whom I was most attached and whom I most admired as a 
child was a woman. (Mother, sister, aunt or other woman.) 


8-3 20. I work under a great deal of tension. 
21-16 21. My sex life is satisfactory. 
10-1 22. I have had very peculiar and strange experiences. 
19-14 23. I am a good mixer. 
9-4 24. I have never done anything dangerous for the thrill of it. 
16-11 25. I like to cook. 
6-0 26. I have the wanderlust and am never happy unless roaming or traveling 
about. 
11-5 27. It wouldn’t make me nervous if any members of my family got into 
trouble with the law. 
16-11 28. I very much like hunting. 
15-5 29. I never worry about my looks. 
11-6 30. If I were an artist I would like to draw flowers. 
6-1 31. Most people make friends because friends are likely to be useful to them. 





* The items preceded by an asterisk showed discrimination value as non-intellective 
items when administered to a new group and with a new criterion—average of all grades 
earned in college. 

t The number given first is the number of “Yes” answers marked by the Non- 
Achievers; the second number is the number of “Yes” answers for the Achievers. 
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Table 3——Continued 








‘ *Yes’ , 
Answers 


Item 





19-11 
23-18 


23-16 
6-1 
8-3 


13-4 
24-16 
24-15 
10-3 
17-11 


20-15 
9-14 


4-10 
13-18 

8-13 

4-9 


0-8 


. I like Alice in Wonderland by Lewis Carroll. 
. I have no dread of going into a room by myself where other people have 


gathered and are talking. 


. I am not afraid of fire. 
. I often think I wish I were a child again. 
. Tam apt to take disappointments so keenly that I can’t put them out 


of my mind. 


. If given the chance, I would make a good leader of people. 

. I enjoy social gatherings just to be with people. 

. Except by a doctor’s orders I never take drugs or sleeping powders. 

. I am fascinated by fire. 

. If I were in trouble with several friends who were equally to blame, I 


would rather take the whole blame than to give them away. 


. I have no fear of spiders. 
. Iam apt to pass up something I want to do when others feel that it isn’t 


worth doing. 


. When I was a child, I didn’t care to be a member of a crowd or gang. 

. I like to read newspaper editorials. 

. One or more members of my family is very nervous. 

. I am often so annoyed when someone tries to get ahead of me in a line 


of people that I speak to him about it. 


. I am not likely to speak to people until they speak to me. 

. I sweat very easily, even on cool days. 

. I can read a long while without tiring my eyes. 

. Most nights I go to sleep without thoughts or ideas bothering me. 

. Most people will use somewhat unfair means to gain profit or an advan- 


tage rather than to lose it. 


. During one period when I was a youngster I engaged in petty thievery. 


. I feel that it is certainly best to keep my mouth shut when I am in 


trouble. 


. I like to read about science. 


4-9 


2-7 


14-19 *58. 


14-19 


. I sometimes find it hard to stick up for my rights because I am so 


reserved. 


. At parties I am more likely to sit by myself or with just one other person 


than to join in with the crowd. 
I have been quite independent and free from family rule. 


. I like science. 
11-16 *60. 


My conduct is largely controlled by the customs of those about me. 





the differential representation of the sexes in the two groups—is quite 
noticeable: Those who work considerably under their capacity like to 
cook (25), would like to collect flowers (2) or draw flowers (30), like to 
work with women (12) and as a child loved some individual woman most 


of all (19). 


Immaturity is also present: The non-achievers like to keep 
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people guessing what they’re going to do next (9), often wish they were 
a child again (35), have daydreams which they do not tell other people 
(15). Relative fearlessness is also claimed by the non-achievers: They 
are not afraid of spiders (42) or of fire (34), claim that they have fewer 
fears than their friends (5). Self-assertiveness is another of their charac- 
teristics: The non-achievers work things out for themselves, they say (16), 
strongly defend their own opinions (17), feel that they would make good 
leaders of people if given the chance (39). Manic trends are also ap- 
parent: They admit to occasional excitement so great that sleep is im- 
possible (6), to liking to travel so much that happiness is a concomitant 
of traveling or roaming about (26), to having worn themselves out at 
times by attempting too much (10). Femininity, immaturity, fearless- 
ness, self-assertiveness and manic tendencies are, then, certain de- 
scriptive adjectives which appear to characterize the answers! of the non- 
achievers to differentiating items in the MMPI. These trends are, how- 
ever, really minimal compared with the strength of the variable which, 
for want of a better term, will be called social extroversion. The non- 
achiever belonged to a crowd (3) or gang as a child, would take the whole 
blame if his crowd got into trouble (41), likes to talk to people on trains 
and buses (14), likes to be in social gatherings just to be with people (38), 
is not disturbed by entering a room where people are talking (33), never 
worries about his looks (29), though when he does worry it is usually 
about a social matter, i.e., something he said that may have offended 
others (8). 

In an obverse way, the social variable also characterizes the achiever 
who, academically, is working above his capacity: He sits by himself or 
with just one other person when at a party (57), does not speak to people 
until he is spoken to (48), is annoyed by those who get ahead of him in a 
line of people (47), feels people would use unfair means to get ahead (52), 
is so reserved he finds it difficult to defend his own rights (56). Less 
social, more reserved, the achiever is also characterized, in his marking of 
the group MMPI, by opposite tendencies from those inferred in the 
preceding paragraph for the non-achiever—that is, he is more mature in 
his attitudes, less feminine, not so socially assertive, though he claims to 
be free and independent of family rule, untroubled by manic tendencies 
and admits to more fears than the non-achiever. One feels that the 
differential between the two groups here considered fits fairly well the 
stereotype of the introvert and extrovert, though not forgetting that the 
60 items in Table 3 contain too many diverse characteristics to allocate 
to a single continuum without making an unwieldy string of poorly 

1 This statement is worthy of being emphasized by a footnote: Note that the wording 
is “characterize the answers,”’ not “characterize the personalities.” Operationally, the 
behavior studied is the marking of items. 
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matched beads. With this word of caution in mind, it may be said that 
the present data show the one who works markedly under his capacity in 
college to be an immature, somewhat manic social extrovert while his 
opposite number who works above his capacity is a rather aloof, well- 
controlled introvert. 

The discussion relative to the discriminating items of the MMPI has 
thus far been concerned with the characteristics of the two groups as 
inferred from the way they responded to the individual items. It is 
clear from the manner of their derivation that the items in Table 3 
should show quite divergent means for the two groups. The mean scores 
of the two groups on the 60 items, scoring one point for each minus answer, 
1 through 42, one point each for each plus answer, 43 through 60, is 24.6 
for the non-achieving group, 39.6 for the achieving group. The question 
remained, however, as to how efficacious the present scoring would be if 
applied to a new population. Consequently, the 60 items were tried out 
on a new group, consisting of 85 students in the elementary psychology 
classes. The first criterion used was the term grade earned by these 85 
students, predicated upon three semester tests and a final examination 
with a combined reliability for the four tests of .96. The criterion in 
this instance was both objective and highly reliable. 

The same scoring was used for the 85 students on the 60 items as for 
the experimental groups. The Pearson-product coefficient of correla- 
tion with the criterion of psychology grades was .390. As a check of the 
possible saturation of the scores thus derived with intellectual factors, 
the MMPI items were correlated with standard scores derived from the 
Altus’ Measure of Verbal Aptitude. Rather surprisingly to the investi- 
gator, r proved to be .15, showing a slight positive relationship with in- 
telligence which was presumed to be ruled out by means of the original 
equated groups. Three possible reasons for the residual correlation of 
.15 with intelligence are these: (1) The full range of aptitude was not 
represented in the experimental groups, owing to the criteria used in 
their selection; (2) it is probable that the manner of selection of the ex- 
perimental groups caused a biased sampling, that is, selecting those 
working significantly above or below their tested aptitude, so that the 
non-intellective factors which characterize them are not perfectly charac- 
teristic of other students of like aptitude; or (3) it may be that the cri- 
terion of intelligence here employed, the Altus Measure of Verbal Aptitude, 
incorrectly shows the two groups to be the same in aptitude when the 
variation between their respective college grade averages (which is in 
itself a fairly adequate criterion of intellectual capacity) was so great— 
and as a consequence, the loading of the presumptively “‘non-intellective”’ 
MMPI items with intelligence will show up to an appreciable degree 
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when the items are applied to a new group, as happened in the present 
case. Despite these restrictions, however, the bias is not too great since 
the per cent of overlap between the MMPI items and grades achieved in 
psychology is 15.21 (squaring the coefficient of .39) while the per cent of 
overlap between MMPI and aptitude is only 2.13. Roughly 13 per cent 
(15.21% less 2.13%) of the relationship between MMPI items and psy- 
chology grades represents non-intellective factors, i.e., factors not ac- 
counted for through the measure of aptitude employed. 

Feeling that the criterion of psychology grades was too parochial, even 
though completely objective and highly reliable, the writer next com- 
puted honor point ratios for each of the 85 students, in which all grades 
earned at the local college were averaged so that a Pearsonian coefficient 
could be computed. The manner of computation of the honor point ratio 
is given in a footnote to Table 1. Ther between the 60 points from the 
MMPI and the honor point ratio was .23, a coefficient which does not 
quite meet the requirements for significance at the .01 level, though it 
does at .05 level. Anr of this size, .23, is too close to the intercorrelation 
of aptitude and the 60 points from the MMPI, .15, for the former to be 
of much use in a multiple correlation for predicting the present criterion, 
the average college grade. Consequently, the 60 MMPI items were 
analyzed by the upper and lower quartile method, the criterion of the 
analysis being, of course, the honor point ratio of the individual student. 
Twenty-six of the 60 items were found to be associated with college grade 
average. In Table 3 these 26 items have an asterisk preceding the 
number of the item. The remaining 34 items from the original 60 were 
so close to zero in the quartile analysis that they were discarded in the 
final scoring. Twenty-five of the 26 items which discriminated when the 
criterion was changed showed differences in the same direction as deter- 
mined by the experimental groups. The one item which changed direc- 
tion was item 43, “I am apt to pass up something I want to do when others 
feel that it isn’t worth doing.” This item seems logically to tap nearly 
the same attitude as does item 60, ‘‘My conduct is largely controlled by 
the customs of those about me,” but apparently it is a psychologically 
rather different question. The likenesses among the two types of scoring 
are so much greater than this shift of one item might imply that one must 
conclude that the non-intellectual factors entering into total grade average 
and into a single highly reliable course grade are closely akin. The 26 
items retain a sample of the social, infantile, feminine and manic ten- 
dencies of, the non-achieveing student similar to those found in the mother 
lode of the original 60 items. 

When the papers of the 85 students were re-scored on the basis of the 
26 items thus derived, the following Pearson product-moment coefficients 
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of correlation resulted: with honor point ratio, .39; with elementary psy- 
chology term grades, .40; with the Measure of Verbal Aptitude, .21. The 
somewhat surprising aspect of the r’s just given is not that the r with 
honor point ratio increased from .23 to .39—that would be expected 
owing to the manner of selecting the 26 items—but that the new scoring 
is slightly better for the original criterion, grades earned in psychology, 
than it had been when all 60 items were used. The difference is obviously 
not significant, .39 to .40; what is significant is that the original validating 
coefficient of .39 did not drop at all when a new criterion was used for 
selecting the items to be scored. The raw inference would be that non- 
intellective factors which are associated with the more inclusive honor 
point ratio are approximately the same as those which correlate with a 
single, highly reliable course grade. The reverse finding does not appear, 
however, to be true, that all non-intellective items associated with a 
single course grade are the same as for the less parochial average college 
grade. It is of parenthetical interest that more effective items might, 
perhaps, have been derived from the MMPI if the original basis for selec- 
ting the experimental groups had been the honor point ratio instead of an 
average standard score on two psychology semester tests. 

It will be noted that the saturation with aptitude in the two scorings, 
the 60-item test and the 26-item test, rose from .15 to .21. The latter 
coefficient is not quite significant at the .01 level. Although this r of 
.21, aptitude versus the 26-item test, may appear to be relatively large, 
it does not markedly reduce the validating coefficients of .39 and .40 for 
the two criteria since both coefficients remain above .30 when aptitude 
is held constant by partial correlation technique. The presence of any 
overlap whatever between what was supposed to be non-intellective items 
and intelligenee does indicate, however, a fault in the technique em- 
ployed. Better than using relatively small groups equated in aptitude 
and differing in grade achievement would have been the survey of all 
students taking elementary psychology on the MMPI as well as on the 
test of Verbal Aptitude. If both tests had been administered to all 
students, the well-known quartile technique of item analysis could have 
been employed with both test variables, the criteria being the same as 
those here employed, grades in psychology and honor point ratio for all 
college grades. By the use of this technique, no items correlating posi- 
tively with aptitude would have been retained, thus assuring that the 
score derived from items so isolated would have a zero correlation with 
aptitude. In this manner questions could have been found which are 
truly non-intellective. And it is through valid non-intellective scales 
that a higher order of prediction for academic work will be made possible 
since the tests thus derived will not overlap the functions measured by 
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the traditional intelligence test, which has proved validity for the pre- 
diction of academic success. 


Summary 


Two equated groups of elementary psychology students were given the 
group form of the MMPI. The basis of the equating was equality of 
intelligence test score and divergence in terms of psychology test scores of 
one-half sigma or more above or below the intelligence test score. In 
this manner two groups of 25 students each were obtained, one group 
being called the Achievers since it represented students working one- 
half sigma or more above their tested aptitude while the other, the non- 
Achievers, consisted of those working one-half sigma or more below 
their tested aptitude. The following findings may be summarized: 


1. The trend on eight of the nine clinical scales of the group MMPI 
was for slightly greater maladjustment on the part of the non-achiev- 
ing students. 

2. The only scale showing significance at the .01 level between the 
mean scores of the two groups was Hypomania. 

3. Subsequent item analysis of the MMPI in terms of final psychology 
grade revealed 60 items which showed a difference of five or more points 
between the two groups. A study of the 60 items indicated that the 
answers of the non-achieving group could be characterized as revealing 
greater femininity, immaturity, fearlessness, self-assertiveness and manic 
tendencies than the achieving group. The best single bi-polar concept 
characterizing the answers of the two groups seemed to be the traditional 
introversion-extroversion, when emphasis is placed upon its social aspects. 
The answers of the achievers revealed introversive tendencies; those of 
the non-achievers, a love of and a dependence on people, here called 
social extroversion. 

4. When the 60 “‘non-intellective” items were administered to a new 
group of 85 students, the 60 items correlated .39 with psychology term 
grades. The same score yielded an r of .23 with honor point ratios for 
total college grades of the 85 students. 

5. When the 60 non-intellective items were analyzed by the upper, 
lower quartile method with honor point ratios as the criterion, 26 items 
were retained. These 26 items correlated .39 with honor point ratio, .40 
with psychology term grades, .21 with the intelligence test used in the 
study. 


The data here reported, though based on a small number of cases, 
appear to justify the belief that if the correct method of selecting them 
is used, adjustment items can be found which will be associated with 
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academic achievement and have no relation whatever to intelligence as 
it is currently measured. The usefulness of such a non-intellective scale 
in conjunction with a valid intelligence test in predicting academic 
achievement needs no elaboration. 


Received January 3, 1948. 
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College Grades and the Group Rorschach 


Grace M. Thompson 
University of California at Berkeley 


The lack of a perfect correlation between college grades and intel- 
ligence, as it is currently measured, is probably due in part to imperfec- 
tions in the measures of intelligence as well as to the unreliability of the 
criterion employed. Yet to a greater degree it is undoubtedly an indica- 
tion that other factors besides sheer academic ability are of considerable 
importance in determining any single student’s academic success. The 
measurement of such personality factors, therefore, seems to be of para- 
mount importance to present-day education, whether in its guidance, 
grouping, or admissions programs. A measure of these non-intellectual 
factors is unfortunately not yet available in the armamentarium of the 
tester, who continues to rely solely on the easily obtained aptitude test 
scores. 

One of the promising techniques now being employed to measure the 
adjustment aspects of academic success is the Group Rorschach, which 
has already been used to advantage on a large scale by Ruth Munroe and 
her colleagues at Sarah Lawrence College, where it is now standard en- 
trance procedure. In her monograph on this topic, Munroe (4) reports 
the rather surprising finding that her Inspection Rorschach (an ab- 
breviated check list to be used by examiners well acquainted with the 
test) was associated with grades to a somewhat higher degree than the 
ACE Psychological Examination, one of the most widely used aptitude 
tests at the college level. Whereas the ACE was more successful in 
predicting success, the Rorschach was more successful in predicting 
academic failure. 

Further indications of the validity of the Group Rorschach used at the 
college level in the differentiation of achieving and non-achieving college 
students, when intelligence was held constant, are to be found in a study 
by Montalto (3) at the University of Cincinnati. Such findings are 
certainly hopeful for the prognosis of large scale personality measurement. 
It would appear, however, that several aspects of the scoring and evalua- 
tion of the test must be simplified before such results could be achieved 
consistently: a fully standardized method of scoring, sufficiently objective 
so that ideally it would require less stringent training than is necessarily 
demanded at present of its scorers; and a thoroughly quantified inter- 
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pretation which would allow the assignment of a numerical value to each- 
protocol, thereby avoiding the present subjectivity of the test’s usage, 
even in the hands of Munroe. 

The following study represents a limited attempt to investigate the 
possibility of using the Group Rorschach in predicting academic success 
by thc + factors inherent within the test which are associated with grades 
but not related to intelligence as we are now able to measure it. 

The Group Rorschach was administered to a beginning psychology 
class at Santa Barbara College of the University of California, using the 
standard slides projected on a screen and following Munroe’s method of 
administration. The class was composed of 128 students, who were en- 
rolled in a representative sampling of the college curricula. Sixty-three 
per cent were men; thirty-seven per cent were women. 


Table 1 
Rorschach Items Investigated 








. Total number of responses (R) 

Number of responses using the whole blot area (W) 

. Number of responses using large detail areas (D) 

Number of responses using small detail areas (Dd) 

Per cent of total responses in which animals the primary content (A%) 

. Per cent of responses using the whole blot area (W%) 

. Per cent of responses using large details (D%) 

. Per cent of responses using small details (Dd%) 

. Number of responses using the white spaces (S) 

. Total number of popular responses (P) 

. Per cent of responses on last three cards (8+9+ 10%) 

. Per cent of responses using animals and humans as primary content (A+H%) 

. Ratio between whole human and animals figures and human and animal details, as 
legs, eyes, etc. (A+H:Ad+Hd) 

. Ratio of whole human figures to human details (H: Hd) 

. Total responses on the achromatic cards (Ach R) 

. Total responses on the chromatic cards (Chr R) 

. Per cent of total responses on the achromatic cards (Ach%) 

. Total number of content categories (Cont. Cat.) 

. Anatomy and sex responses (An, Sex) 

. Number of human movement responses seen (M) 

. Human movement responses in small blot areas (M in Dd) 

. Number of responses with animals in motion (FM) 

. Ratio of human to animal movement responses (M: FM) 

. Number of vista or perspective responses 

. Number of responses describing movement of natural forces or generally inanimate 
objects (m) 

. Pure color responses, or those in which color is the primary determinant and form 
secondary (C, CF) 

. Responses using color, with form predominant (FC) 


1 
2. 
3 
4. 
5. 
6 
7 
8. 
9 
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Table 1—Continued 





. Color sum: evaluating each pure color response as 14, each color-form as 1, each 

form-color as $ and summating values 

Ratio between M responses and color sum (M:C) 

Number of responses using shading as determinant (Y) 

Ratio of whole responses to human movement: responses (W : M) 

Per cent of responses determined purely by form of blot (F%) 

Number of good form responses (F+) 

Per cent of good form responses within total form responses (F + %) 

Number of poor form responses (F —) 

Presence of popular M response in Card IIT (M in ITI) 

Presence of popular M response in Card II (M in IT) 

Presence of popular response for Card V (P in V) as a whole 

Presence of popular response in lateral detail of Card VIII: animal, bear, etc. 

(P in VIII) 

Statement by subject that color used as determinant 

Organization total for whole series of cards (Z) 

Average organization value per response (Ave Z) 

Total organization on the achromatic cards (Ach Z) ’ 

. Total organization on the chromatic cards (Ch Z) 

. Per cent of organization total on the achromatic cards (Ach Z%) 

Presence of any responses on the last three cards utilizing the whole blot with no 

other determinant than form (WF last 3) 

Refusal of any of the cards 

. Presence of any of the following: color naming, blot description, marked persevera- 
tion of one response 

49. Total poor form responses on all cards, including minus responses for all determi- 

nants: M—, FC—, ete. 

50. Per cent of popular responses (P%) 

51. Number of space and small detail areas combined (Dd+S) 

52. Total number of shading and perspective responses (FV + Y) 


SREESES SREASRESSESS B 


BS 





Each test was scored according to the usual method for the scoring 
of an individual administration, following Beck (1), whose numbered 
delineations of areas, tables for plus and minus form values, and broad 
categories of symbols for determinants seemed to lend themselves best 
to the desired objectivity and reliability demanded by the group method. 
Each protocol was scored in addition for Klopfer’s FM, or animal move- 
ment, and m, or the movement of natural forces and generally inanimate 
objects, which the research of Munroe and others had indicated to be 
significant items. A list of 52 Rorschach factors (see Table 1) generally 
considered to be of interpretative significance was then used as a basis 
for summarizing the protocols of the 128 students; it will be observed 
that each factor was one which could be put down in a single purely 
quantitative form. 
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In order to determine, first of all, which of the 52 Rorschach items 
were associated with grades, it was necessary to extract the top and bot- 
tom quartiles from the class distribution in semester grades. The single 
criterion of the final grade in the psychology 1A class was used, since this 
grade indicated a highly reliable criterion, being based on a series of objec- 
tive tests whose combined reliability had been previously shown to be 
.96. The two extreme quartiles were then compared on the findings 
of each of the separate 52 Rorschach categories in order to discover those 
particular items which showed a direct relationship to the criterion. Any 
item which showed a difference greater than four points between the 
absolute number of cases in each of the extreme quartiles having the 
item in question was retained as being of possible differentiating value. 
For example, item 1, or total number of responses: 23 students in the 
upper quartile had less than 30 responses, whereas only 15 in the lower 
quartile had less than 30 responses. The difference between the quartiles 
(23-15) exceeding four, the item was retained as of possible value. 

Thirty-four items met this minimum requirement; since the method 
of selection was admittedly only a rough and general one, each of the 34 
was given an equal weighting of one, and each separate Rorschach pro- 
tocol assigned a numerical score on the basis of how many of the 34 items 
the paper in question possessed. When these tentative numerical scores 
were correlated with the original criterion of grades, a Pearson product- 
moment coefficient of .52 was obtained, indicating that there is a definite 
relationship between certain factors in the Rorschach and the present 
criterion. 

Since comparison of the two quartiles had indicated some of the 
Rorschach items to possess far more discriminating value than others, 
16 of the most valid items among the 34 were employed in a second cor- 
relation to see whether or not a smaller number of items would yield as 
adequate a validating coefficient. When this shortened scoring was cor- 
related with psychology grades, r was found to be .50. 

Since the original purpose of the study was to investigate chiefly the 
personality or adjustment factors related to academic success, it seemed 
likely that this coefficient was spuriously high, since many of the 34 items 
may have been simply Rorschach variables reflecting the same sort of 
intelligence that one of the usual aptitude tests could have measured 
with a much smaller expenditure of time and effort. Accordingly, each 
of the original 52 items on which the protocols were scored was again 
item-analyzed, this time using the top and bottom quartiles of aptitude, 
as measured by the Altus Measure of Verbal Aptitude, a short verbal test 
which had been shown to give a correlation of .64 with the original cri- 
terion of grades. 





402 Grace M. Thompson 


When the aforementioned 16-item Rorschach scoring was correlated 
with aptitude, it was found to give a coefficient of .43, high enough to 
insure—even without running a multiple correlation—that the addition 
of the Rorschach scoring in that form would add little to the prediction 
of college success beyond that already offered by the aptitude test. 


Table 2 
Rorschach Items Associated with Grades 























Grade 
Quartiles 
Qa Q Rorschach Item 
23-15 R fewer than 30 
25-19 W more than 4 
24— 8* D fewer than 18 
14— 3* Dd0or1 

6—- 2 A% under 25% 
18-10* W% more than 24% 
20-12 D% under 65% 
22-17 Dd% under 15% 
29-—22* S fewer than 4 

5- 2 P more than 8 
21-—14* 8+9+10% under 35% 

8-4 A+H% between 41% and 54% 
25-21 A+H:Ad+Hd equal to or more than 2(Ad+Hd) 
19— 4* H more than 2Hd 

8-1 Chr R fewer than 10 
21-13* Achr% more than 44% 

25-21 Cont. Cat. fewer than 13 
28-22 M more than 2 

25-20* M more than FM 

17-— 8* m present 

20-12 FC more than C plus CF 
30—23* C sum less than 4.5 

28-16* C sum less than M 

25-19* W:M equal to or less than 2:1 
31-26 F+% more than 69% 

15- 8 C not stated as determinant 
30-25 Z more than 19 

Q- 4* Ave. Z two or more 

7-12* Ave. Z less than 1 (negative weighting) 
29—20* Ach Z more than 9 
20— 8 Ach Z% more than 39% 
20-11* Total neg. R fewer than 3 
31-26 P% more than 14% 

26-19 Dd plus S fewer than 9 
12— 6* FV plus Y=0 or 1 





* Items retained for shortened scoring with grades. 
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ted Table 3 
to Rorschach Items Associated with Aptitude 
ion = 
ion Aptitude 
st. Quartiles 
Q Q Rorschach Item 
22-14 R fewer than 30 
27-19 W more than 4 
—= 20-138 D fewer than 18 
28-17 Dd fewer than 5 
23-12 A% under 40% 
20-11 W% more than 24% 
er 19-14 D% under 65% 
24-14 Dd% under 15% 
27-20 S fewer than 4 
16-12 P more than 6 
29-21 A+H% between 40% and 74% 
22-17 A+H more than 2 (Ad+Hd) 
21- 8 H more than 2Hd 
15— 9 Chr R fewer than 15 
22-14 Ach % more than 44% 
20-— 6 M more than 3 
19-14 FM fewer than 3 
26-13 M more than FM 
17-12 FV present 
20- 8 m present 
19- 8 FC more than C plus CF (or both 0) 
24-15 C sum less than M 
22-13 W:M equal to or less than 2:1 
16- 8 F% more than 60% 
17-11 F+% more than 84% 
30-22 M in III 
20-15 M in II 
32-24 Z more than 19 
ll- 1 Ave. Z more than 2 
2-17* Ave. Z less than 1 
30-18 Ach Z more than 9 
18-13 Ach Z% more than 39% 
4-11* Persev., color naming, blot desc. 
19- 8 Total neg. R fewer than 3 
31-23 P% more than 14% 
29-17 Dd plus 8S fewer than 10 





* Given negative weighting. 


Items which analyses showed to be too highly associated with intel- 
ligence were accordingly discarded in the final scoring. 

It was of interest to see how strong the relationship between certain of 
the Rorschach factors and a measure of aptitude would actually be. 

































404 Grace M. Thompson 


When the best 35 were extracted from the original list of 52 and a quanti- 
tative scoring assigned to each student’s paper by summating the number 
out of the 35 which his protocol possessed, a correlation of .51 was ob- 
tained between this scoring and the scores on the aptitude test. It was 
interesting to note that even within this homogeneous group of college 
students, the single item of M—or human movement responses seen in 
the cards—gave an r of .37 when correlated singly with aptitude. 

In order to avoid an overlap between the functions of the aptitude 
test and the Rorschach, it was necessary to retain in the final Rorschach 
scoring only those items which had, through a comparison of the two 
separate sets of item analyses, proved themselves to be positively associ- 
ated with grades, yet at the same time minimally or negatively associated 
with aptitude. It will be seen, however, from even a cursory examination 
of Tables 2 and 3 that such items were relatively rare; therefore, several 
were retained which showed a positive relation to aptitude, so long as 
their relation to grades was more marked. 

Twenty items fulfilled the requirements, and were accordingly used 
as the basis of a final scoring of the Rorschach protocols for the non-in- 
tellective factors related to schoo! achievement. The same procedure 
was followed as before, that is, each of the 20 items present was assigned 
a weighting of plus one, and the individual students’ records given a final 
numerical score according to how many of the 20 his particular test had 
possessed. A Pearson product-moment correlation of .38 was obtained 
between this scoring and the original criterion of term grades, whereas 
this same Rorschach scoring gave a coefficient of only .04 when correlated 
with the Altus Measure of Verbal Aptitude. The correlation between 
this Rorschach scoring and grades when aptitude was partialled out was 
.46. A multiple correlation, showing the combined influence of the ap- 
titude test and the Rorschach in predicting grades, was found to be .73, 
an appreciable rise from the original correlation of .64 between aptitude 
and grades. 

A shortened scoring, using only 13 of the 20 non-intellective items, 
gave a slightly lower correlation of .34 with grades, and .07 with aptitude; 
it was therefore deemed advisable, at least for this particular group, to 
retain the original 20 point scoring. 

Several interesting clusters of Rorschach patterns seem to appear 
upon the examination of the 20 non-intellectual variables. The first of 
these is the tendency for the achieving students to concentrate their re- 
cords into fewer responses than the non-achievers. They tend to have 
fewer chromatic responses, fewer achromatic responses, and fewer content 
categories. They make less use of the large detail areas ordinarily ac- 
cepted as indicating a common-sense, matter-of-fact approach. These 





lw Tae eeer.!llt ee 


a 


College Grades and Group Rorschach 405 


large details are smaller, in relation to the total record, both absolutely 
and in terms of ratio. 











Table 4 
Non-Intellectual Rorschach Items 
Grade = Aptitude 
Quartiles Quartiles 
Qa AQ Qa A Rorschach Item 
24— 8 20-13 D fewer than 18* 
14-3 9 7 Dd 0 or 1* 
20-12 * 19-14 D% under 65% 

5- 2 2-2 P more than 8* 

21-14 19-17 8+9+10% under 35% 

8-4 5- 5 A+H% between 41% and 54%* 
25-21 23-23 A+H equal to, or more than, 2(Ad+Hd)* 
21-15 18-15 Achr R fewer than 13 

8-1 7-4 Chr R fewer than 10* 

25-21 16-20 Cont. Cat. fewer than 12* 

20-20 18-20 An, Sex present 

16-14 15-20 FV absent* 

30-23 25-27 C sum less than 4.5* 

28-16 24-15 C sum less than M 

11-10 9-11 Y more than 3 

5- 6 2-10 M not in III* 

17-17 15-18 P not in VIII 

15- 8 10- 7 C not stated as determinant* 
20- 8 18-13 Achr Z% more than 39%* 
12- 6 10- 8 FV+Y equal to 0 or 1* 





* Items retained in shortened scoring. 


Further, there appears to be a shying away from color. It is a better 
sign for grade-getting not even to mention color as a determinant. The 
proportion of responses on the last three colored cards is smaller for the 
achieving group, and there are more of them that have a color sum less 
than four and one-half. More also show an M:C (human movement to 
amount of color) ratio balanced on the side of human movement, in 
theory a more introverted pattern. 

Not only does there seem to be a shying away from color, but ap- 
parently a correspondingly greater interest of the achievers in the achro- 
matic cards. They use shading to a slightly greater degree (Y more 
than 3). They organize the material more adequately on the dark cards 
(Ach Z% more than 39%). Evidently there are more of them in each 
extreme of the use of shading, however, since it is also a favorable sign 
for both perspective and texture responses to be omitted altogether. 
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There is a slight amount of evidence to suggest that the achievers 
are more conforming; at least they use more popular responses in general, 
although they react to a slightly lower degree to the two popular M 
responses in Cards II and III than might be expected. 

The remaining discriminating items do not seem to fit a recognizable 
pattern. The probability that the achievers might attend less to the 
insignificant details (Dd) is in the direction of what the conventional 
Rorschach interpretation might lead one to expect. Similarly the ratio 
of human and animal wholes to human and animal detail responses is not 
a surprising finding, since a healthy superiority of the former is generally 
considered advisable. Why the presence of anatomy and sex responses 
should have no apparent relationship to academic adjustment and only 
a slightly negative one to aptitude is somewhat more surprising, especially 
in view of the fact that Harrower-Erickson has listed their presence as 
one of the least desirable of the Rorschach items discriminating between 
psychoneurotics and normals. 


Summary 


The Group Rorschach test, when administered to a class of 128 college 
students, was found to be a valid predictor of the adjustment or motiva- 
tional aspects of grades, to the extent that a quantified scoring of the test 
papers gave a correlation of .38 with the criterion of semester psychology 
grades, and an r of only .04 with a measure of verbal aptitude. 

It is suggested, therefore, that the Group Rorschach may eventually 
prove useful as a large scale, practical, and objective tool for the measure- 
ment of those factors influencing grades which are not purely intelligence 
factors in the sense that they are capable of measurement by our standard 
aptitude tests. It will remain to be seen, of course, whether these same 
items—and undoubtedly, not all of them—will remain valid under other 
conditions and in other college groups. Cross-validation of the Ror- 
schach factors here isolated should be undertaken on further groups before 
any conclusive diagnostic weighting could be assigned to any of them. 
It would be expected that some loss in predictive value contributed by 
those particular items would occur in the process of cross-validation; and 
only actual practice could demonstrate what extent the relationship 
described here between the Rorschach and academic achievement would 
be verified on repetition. Several comments, however, appear justified 
here: first, that the Group Rorschach can be quantified and still retain 
diagnostic value—a finding which would corroborate that of Munroe and 
others. The advantages of quantification and the group method of ad- 
ministration would appear to lie not only in the time of administration, 
one hour for a whole group, but also in the scoring, subjective elements 
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being minimized by adherence to a predetermined set of categories which 
yielded an objective scoring that any other experimenter could apply 
equally well. Median scoring time in the present study was approxi- 
mately half an hour per record, and might be expected to vary depending 
on the number of Rorschach factors investigated, experience of scorer, etc. 
In the event that objective scoring and interpretation could prove prac- 
tical on a large scale, it would also seem probable that the present strict 
requirements for qualified Rorschach scorers could be lessened somewhat. 

Finally, then, it would appear that the Group Rorschach could be 
used in the prediction of academic success above and beyond the predic- 
tion offered by a standardized intelligence test, and it is to be hoped that 
further research will expand the practical use of the method. 


Received November 6, 1947. 
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A Follow-up Study of Personal Counseling Versus 
Counseling by Letter * 


C. Harold Stone and Irving Simos 
University of Minnesota 


The widespread vocational counseling of returned veterans of World 
War II by the Veterans Administration, by other public and private 
agencies, and by universities and colleges has brought to the fore 
the question of methods of reporting to the counselee information con- 
cerning his aptitudes, abilities, interests, and possible areas for training. 
In numerous instances in the State of Minnesota, veterans who have 
taken advantage of the advisement service offered by the Veterans Ad- 
ministration have requested written reports of test results and counselors’ 
recommendations. The apparent purpose of such requests has been to 
obtain a report which might be of benefit when applying for employment 
as well as a written record for future reference. Counseling letters sup- 
plementing the personal interview have been utilized by several VA 
Advisement Centers, and in one instance a follow-up study was made to 
determine what use the veteran made of the report (1). 

In view of this current interest in the use of written reports in coun- 
seling, results of a study conducted by the Employment Stabilization 
Research Institute in 1942 have been summarized for presentation with 
the thought that they may be of general interest. 

During the late fall of 1941 and early winter of 1942, a ten per cent 
sample, totalling 415 unemployed persons, was selected randomly from 
registrants filing for employment in the St. Paul office of the United States 
Employment Service. A comprehensive battery of aptitude, interest, 
ability, and personality tests was administered to the group, personal 
data and occupational history were obtained by interview and clearance 
with the Confidential Social Service Exchange, and reported occupational 
histories were verified through personal contacts with local employers 
and letter contacts with out-of-town employers. Careful analyses of all 

* The study reported herein was conducted as a part of the studies of occupational 
competence of unemployed persons in St. Paul under the direction of the first author 
who was supervisor of studies of frictions in the labor market. Dale Yoder and Donald 
G. Paterson were co-directors of the Employment Stabilization Research Institute 
Study of Employment, Unemployment and Relief in St. Paul, 1939-1942. Results of 


the larger study are published in Local Labor Market Research, University of Minnesota 
Press, 1948. 
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case data were then made for each person to determine job possibilities in 
relation to existing and projected local employment opportunities. 

In reporting analyses of results to the individuals participating, the 
sample was divided randomly into two groups with persons in one group 
being counseled individually by the staff counselor! and those in the other 
group receiving written reports in the form of a “counseling letter.” 
During the conduct of the study, 214 cases were counseled personally and 
201 received counseling letters. The follow-up study reported herein 
includes 196 of the personally counseled cases and 184 cases who received 
counseling letters.? 

The counseling interview was normally about one hour in length, and 
after discussions of work history, previous training, test results, and other 
factors related to the occupational adjustment of the individual, specific 
plans of action were worked out by the counselee. More than one inter- 
view was required in many instances in order to aid the counselee to re- 
solve more adequately his problems of occupational adjustemnt. 

A standard outline was established as a framework for the counseling 
letter sent to those who did not receive the benefits of personal counseling. 
A summary of test results stated in general terms was included along 
with specific recommendations for suitable employment and training 
possibilities. A sample from the Institute files is shown on following page 
to indicate the form and general nature of the letters. 

In order to discover whether reactions to counseling by personal in- 
terview differed significantly from those to counseling by letter, and 
further, to obtain indications of the effectiveness of both methods in 
aiding the unemployed in job seeking and in improving their morale and 
self-confidence, a follow-up study was conducted in July, 1942 (ap- 
proximately six months after the testing and counseling). 

The follow-up study was conducted by the mailed questionnaire 
method. Equivalent questionnaire forms were mailed to members of the 
counseled and letter groups. Following Toops’ method of using follow- 
up letters to obtain maximal returns (2, 3), three follow-up letters to the 
questionnaire were used to reach a percentage of returns deemed adequate. 
Table 1 shows the per cent of questionnaire returns received from the two 
groups. No significant sex differences were found in the percentage of 
returns in either group. Returns, however, from Counseled cases (85%) 


1 Vivian J. Humphrey, now Senior Student Counselor, Student Counseling Bureau, 
University of Minnesota. 

* Eighteen of the Counseled group (16 full-time employed and 2 incomplete) and 
seventeen of the Letter group (16 full-time employed and 1 incomplete) are excluded 
in this report. The inclusion of employed persons in the total sample resulted from the 
random method of selection of employment office registrants. 
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Mr. J. ¥. fT. 
St. Paul, Minnesota 


Dear Mr, T,? 


The following is a report of the results of the interviews and tests which you 
took at the Employment Research Center, We hope that this information will be 
helpful to you in seeking work or in preparing yourself for future employment 

by pointing out a number of job possibilities for which you seem to be ‘fitted, 


Work For Which You Appear Qualified By Experience And Training 


On the basis of your work experience alone, your best immediate job opportunities 
would appear to be in work such as wrapper and packer or in the operation of some 
factory machines, You could aleo qualify es a painter's helper, plumber's helper, 
or possibly as a truck driver, 


Job Opportunities Open To You On The Basis Of Your Measured Capacities 


An analysis of the results of your tests indicates that you have excellent mechani- 
cal ability and superior ability to work at jobs requiring the rapid and accurate 
use of your fingers and small tools, Your clerical ability is only fair, and it 
is not advised that you seek training for, nor employment in office work, It is 
also recommended that you do not attempt work as a salesman, Your interests are 
similar to those of men who are successful skilled tradesmen, as for example, men 
who work as painters, carpenters, vrinters, and machinists, These resulte indicate 
that you should be successful as a semi-skilled worker in a factory or working at 
mochines which would not involve a long training period, The suggestions made in 
the preceding paragraph are further indicated by the test results. 


Work For Which You Can Qualify If You Secure Additional Training 


It is strongly recommended that you sgcure additional training in some trade. You 
should investigate the possibility of taking courses at the St. Paul Vocational 
School, Since there is considerable demand for these courses, however, you may 
find that you can obtain training more quickly in a reliable private trade school 
such as Dunwoody Institute in Minneapolis. Your excellent mechanical ability 
indicates very strongly thst you should secure training in some trade, such as 
mechinist, mechanic, plumber, or in some other mechanical trade in which you may 
have & special interest. 


Th. tt 

You may use this letter when applying for a job if you wish your prospective 
employer to know of our recommendations. If ea prospective employer is interested 
in obtaining additional interpretations of the interviews and testa, we shell ve 
glad to supply further information at his request, 

Very truly yours, 

(signed) 
Member of Research Staff 


Fic. 1. Sample of Counseling Letter. 


were significantly greater than those from Letter cases (74%). Total 
return of all questionnaires sent was 80 per cent. 

Table 2 shows in percentages the responses to items in the question- 
naire considered relevant to this report.* Inspection of the table reveals 


* Questions included in the questionnaire and not summarized here are as follows: 
By what firm are you employed (if employed at present)? What is the title of your 
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Table 1 


Returns of Follow-up Questionnaires Received from Unemployed Counseled by 
Personal Interview and Unemployed Counseled by Letter 








Male Female Total 
_ oo Sent Returned Returned 


Sa eh 6CUklU a 














Counseled Cases 119 103 8&6 77 64 83 167 85 
Letter Cases 114 86 75 70 51 7 137 74 


Total Cases 223 189 81 147 115 (78 304 80 
Critical Ratio 
(% Counseled vs. 


% Letter returns) 1.46 2.67 
P .144 .034 .008 





an unusual consistency of response between the Counseled cases and the 
Letter cases in the majority of instances. It had been expected that 
much wider differences would be found favoring the Counseled cases. 

Areas of close agreement‘ between males counseled by personal inter- 
view and those counseled by letter may be summarized by question num- 
ber as follows: 


1. Employed since report of test results received; 2. Employed at 
time of follow-up; 3. Job satisfaction; 5. Report of test results helped 
decide type of job to seek; 6. Report of test results disclosed latent abili- 
ties; 7. Self confidence increased; and 9. Employment service applicants 
should have opportunity to take tests. 

Study of the actual responses to questions 3, 5, 6, 7, and 8 indicates 
quite clearly that a high percentage of both males and females placed a 
high value on the effectiveness for them of the testing and counseling 
program. 

The widest discrepancy between the two male groups appeared in 
question 4. Two-thirds of the men counseled by personal interview felt 
that the discussions with the counselor helped them in subsequently 
talking to employers. Less than half of the men counseled by letter, 
however, found the written report helpful when talking to employers. 
The difference is statistically significant at the 1 per cent level. Similar 


job? Exactly what do you do on your present job? How much do you earn per hour— 
per week? Since receiving a report of your tests, to what firms have you applied and 
for what type of work? Have you taken any training course since receiving a report 
of the test results? Will you please give us any suggestions you may have for improving 
this type of service? 

‘ Differences in percentages not statistically significant, P > .05. 
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Table 2 
Comparison of Responses to Follow-up Questionnaire by Those Counseled by 
Personal Interview and by Those Counseled by Letter 


Note: Responses by sex and for the total group are shown in percentages, rounded 
off to yield 100% for each group. N’s are shown in Table 1. 
Question 1. 
Counseled: Have you been employed since you discussed your test results with our 
job counselor? 


Letter: Have you been employed since you received a report of your test results? 









































Male Female Total 
Yes No Yes No Yes No 
% % % % % % 
Counseled Cases 92 8 69 31 82 18 
Letter Cases 93 7 56 44 78 22 
Total Cases 92 8 63 37 80 20 
Question 2. 
Counseled and Letter: Are you employed at present? 
Male Female Total 
Yes No Yes No Yes No 
% % % % % % 
Counseled Cases 85 15 61 39 75 25 
Letter Cases 85 15 54 46 72 28 
Total Cases 85 15 58 42 74 26 
Question 3. 
Counseled and Letter: Do you like the type of work you are now doing? 
Male Female Total 
Yes No ? Yes No ? Yes No ? 
% FD MN > 2» =» % % %%N 
Counseled Cases 70 28 2 78 16 6 73 24 3 
Letter Cases 76 22 2 81 19 78 21 1 
Total Cases 73 26 1 79 18 3 75 23 2 





Question 4. 
Counseled: Do you feel the opportunity you had to talk over your test results with 
the counselor helped you in talking to employers? 
Letter: Did you find the letter which was sent to you giving a written report of 
your test results helpful when talking to employers? 

















Male Female Total 
Yes No ? Yes No ? Yes No ? 
o % % FW MN % TN 
Counseled Cases 66 28 6 69 23 s 67 26 7 
Letter Cases 44 46 10 42 50 s 43 47 10 
Total Cases 57 35 8 59 33 8 58 34 8 
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Table 2—Continued 
Question 5. 
Counseled: Did the discussion of the test results help you decide what kind of job 
to look for? 
Letter: Did the report of the test results help you decide what kind of job to look 
for? 





Male Female Total 
Yes No Yes No Yes No 


% . a > .& 
Counseled Cases 67 30 68 32 68 31 
Letter Cases 63 37 64 36 63 37 
Total Cases 65 33 66 34 66 33 

















Question 6. 
Counseled: Did an understanding of your test results disclose abilities which you 
did not know you had? 
Letter: Did the report of your test results disclose abilities which you did not know 
you had? 





Male Female Total 


Yes No Yes No Yes No 
% % % % % % 
Counseled Cases 41 69 31 62 37 
Letter Cases 57 43 57 43 58 42 
Total Cases 58 41 64 36 60 39 


Question 7. 
Counseled: Did your discussion with the counselor result in increased confidence 
in yourself? 
Letter: Did the written report result in giving you increased confidence in yourself? 


Male Female Total 


Yes No Yes No Yes No 
JY 9% ee % 
Counseled Cases 83 17 82 18 82 18 
Letter Cases 70 30 77 23 73 27 
Total Cases 77 23 80 20 78 22 
Question 8. 
Counseled: In general, do you believe that taking the tests and discussing the 
results with the counselor were helpful to you? 
Letter: In general, do you believe that taking the tests and receiving the written 
report of the results were helpful to you? 


Male Female Total 


Yes No Yes No Yes No 
% % % % % % 
86 13 83 17 84 15 
76 24 70 30 74 26 
81 18 78 22 80 20 
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Table 2—Continued 
Question 9. 
Counseled and Letter: Do you think that applicants at the Employment Office 
should be given tests? 
Male Female Total 
To To To 
To Those To Those To Those 
Every- Who To Every- Who To Every- Who To 


one Ask None one Ask None ? one Ask None ? 
% HN >» | & % Fo % %N 





Counseled Cases 54 46 dees ee a 51 46 2 1 
Letter Cases - .@ 6 49 49 2 49 48 3 
Total Cases 52 46 2 @~ @-2f 52 45 3 





differences for the women were found. However, it must be noted that 
these two questions are somewhat different in wording and implications 
and a direct comparison of them may not be logically justified. 

@ Although the majority of both groups responded favorably to question 
8, concerning helpfulness of reports of test results, the difference in favor 
of those individuals counseled personally is statistically significant. 


Summary 


In summary, it seems clear that both groups were favorably impressed 
with the testing and counseling service. The high percentage of returns 
indicates that rapport had been well established, especially so for those 
personally counselled. The differences between the responses of the two 
groups are in general not statistically reliable, though what difference 
there is favors the personally counselled group. The use of counseling 
letters, however, is clearly shown to be an effective means of reporting 
results. 

Unfortunately, a third procedure was not utilized, namely personal 
counseling plus a summary letter to be retained by the counselee. Had 
this combined procedure been used, it seems reasonable to believe that a 
still higher percentage of favorable reactions to the counseling service 
might have resulted. 

Received January 15, 1948. 
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The Psychogalvanometric Method for Measuring the 
Effectiveness of Advertising * 


Gordon Eckstrand and A. R. Gilliland 
Northwestern University 


Advertisers have long been searching for objective techniques or 
methods of pre-testing advertising material which is inexpensive, fast and 
reasonably valid. That is, a technique or method of predicting, in ad- 
vance of use in an advertising campaign, the effectiveness of certain ad- 
vertising material as judged by a criterion of volume of sales induced. 

Whether an advertisement is a good one or not can only be determined, 
in the last analysis, by running the ad as scheduled and then observing 
the effect on sales exclusive of other factors. The buying public is, after 
all, the final judge. But this is an expensive method of operating con- 
sidering both time and money, since it does not permit the weeding out 
of poor ads before they are put before the public as part of an advertising 
campaign. In 1946 more than two billion dollars was spent for all kinds 
of advertising. With this great amount of money being spent, it is im- 
portant for advertisers to get as much as possible out of each advertising 
dollar. Thus the pre-testing of advertisements is of great economic in- 
terest as well as an interesting problem in the prediction of human 
behavior. 

In an attempt to get some idea of what to expect from an advertising 
appeal in advance of its actual use on the public, and in an effort to 
determine what factors go toward making good and poor ads, advertisers 
have developed several techniques for testing their material. Experts’ 
judgments, cross sections of public opinion, point rating systems, memory 
for ads, point-of-purchase sales tests, and split runs in media of limited 
circulation have all been used to test advertising material. However, 
some of these techniques have shown little validity, and others are time 
consuming and costly. Consequently the field of advertising is still 
looking for a valid and rapid method of measuring the effectiveness of 
advertising matter. 

It is the purpose of this research to investigate the usefulness of the 
psychogalvanic response as a measure for use in predicting the effective- 
ness of advertising material as measured by a sales test criterion. 

* The authors are indebted to Mr. G. Maxwell Ule, Director of Research, McCann- 


Erickson, Inc., Chicago, Ill., for permission to use the ads and appeals used in this study 
end for the sales test results used as a criterion in this study. 
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For a good many years after its discovery as a psychological measuring 
tool in 1888, the psychogalvanic phenomena enjoyed almost unbelievable 
popularity in psychological research. It has been studied with reference 
to everything from attitude (1) to the effect of cobra venom (7). How- 
ever, when we turn to a consideration of the psychological correlates of the 
psychogalvanic response we find little agreement among investigators. 
At various times and by various investigators, the psychogalvanic re- 
sponse has been claimed as a measure of emotion, conation, attitude, at- 
tention, level of consciousness, and many others (5). 

Landis and Hunt (6) have pointed out that the galvanic response is 
not “‘a measure of, regular criterion of, or indicator of, any one or a com- 
bination of these traditional psychological categories.”” However, as 

) both Landis and Darrow (3) have agreed, it seems to be a fairly certain 
) method of demonstration of general autonomic activity. 

It seems fairly well established, then, that while many stimuli and 
stimulus situations may serve to elicite the psychogalvanic response, the 
response seems to be a good measure of the amount of general bodily 
arousal present at any time or during any portion of behavior. It seems 
equally well established that the psychogalvanic response is not a valid 
and reliable measure of any of the traditional psychological categories. 
This does not necessarily mean, however, that the psychogalvanic re- 
sponse will not be of value in predicting certain more complex types of 
responses. It may be that in a response as complex as a person’s reaction 
to an advertisement, several or many of the psychological conditions 
mentioned above may be present and affecting behavior. It is this total 
response to the situation, this total amount of arousal in which we are 
interested. The psychogalvanometer seems well suited to measure this 
total arousal. 

There has been very little work done using the galvanometer to test 
advertising material. However, some evidence has accumulated to indi- 
cate that the changes in skin resistance of selected samples of subjects 
exposed to advertising material may be of value in predicting the later 
effectiveness of that material. Ruckmick (8) conducted a study in which 
the responses of the sweat glands were recorded during a three second 
exposure of advertising copy. Several series of copy, run with twenty 
subjects, revealed an internal consistancy of data and also gave results 
which tallied in a general way with the choices obtained by the serial 
procedure of impression. 

Conrad (2)! conducted an investigation to determine whether it was 
possible to study the responses made to advertising appeals of car cards 
by means of a psychogalvanic response apparatus. Using a Hathaway 


7 This investigation was done under the direction of Dr. A. R. Gilliland. 
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galvanometer, he exposed a series of car cards to a large group of subjects 
for five seconds each. The subjects used were college students and the 
cards were presented in a counterbalanced order. He later had the sub- 
jects rank the ads as to their effectiveness in getting attention. He found 
that the results obtained in this manner correlated only .18 with the re- 
sults obtained by the galvanometer. He did find, however, that definite 
galvanic responses could be obtained with advertising material as stimuli, 
and that certain material got larger responses than other material. 

Gilliland and Sharp (4) showed that the psychogalvanometer does 
record variations in the effect of advertising on readers. They did not 
attempt, however, to correlate the size of the subjects’ galvanic reactions 
with the effectiveness of the ads as determined by an outside criterion. 
They pointed out the need for using the psychogalvanometer to test ads 
that had already been evaluated as to selling effectiveness in order to 
establish the validity of the method. 

In these earlier studies the technique has not been subjected to a 
rigid experimental test where a suitable subject group was used and where 
the method was validated against a suitable objective criterion. The 
few studies reported here have used either no criterion of the ads’ effective- 
ness or have used only the subjects’ opinion. This is, at best, only a 
criterion of very limited value. The best, most direct, and most objective 
criterion readily available is some measure of the ads’ actual selling effect- 
iveness in a realistic advertising situation. It is the purpose of this 
research to test the hypothesis that effective advertising’material, as 
judged by a sales criterion, will, on the average, induce larger psycho- 
galvanic responses in a selected sample of the population than will less 
effective advertising material. 


The Experiment 


Subjects. The material tested dealt with three popular, nationally 
advertised food products made by the same company. An attempt was 
made to obtain a subject sample which would approximate a sample from 
the population to which the ads were directed. Since the material dealt 
with nationally advertised products, the sample used falls short on one 
count immediately. The sample used had to be drawn from the area in 
and around Evanston, Ill. Evanston and the surrounding area cannot 
be considered a representative section of the country, but the sample 
drawn from this area seems more representative of the country at large 
than it does of the Evanston area. 

Since the material dealt with in this study was concerned with basic 
food products, the sample was made up of married women or single women 
who cook and purchase groceries. A few women were included who were 
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engaged to be married and thus will soon be part of the potential buyers 
of these products. An attempt was made to get a distribution of subjects 
fromjthe various income and age groups and a distribution of subjects 
with and without children. Due to the difficulty of obtaining subjects, 
no attempt was made to match local or national statistics on these factors. 
Table 1 presents the number of subjects falling in each of the categories. 


Table 1 
Analysis of the Subject Group 








Income Number Age Number Children Number 





Below $3,000 18 Below 24 15 No children 29 
$3,000-$5,000 16 25-39 18 Children 19 
Above $5,000 14 40-54 ll 

Above 55 4 





Ads and Appeals Used. Three series of advertising material were 
tested. ‘Two of the series consisted of advertising appeals made up into 
finished advertisements and the third series was composed of advertising 
appeals in verbal form not yet made up into ads. 

Series 1 consisted of three finished ads of pancake flour. Each ad 
was 11” by 844” and was done in black and white. With respect to all 
variables but basic appeal the ads were quite similar. They contained 
about equal amounts of pictorial illustration, headlines of approximately 
equal length, about the same amount of copy, and the brand name was 
used equally often. Series 2 consisted of two finished ads dealing with 
a baby food. Each ad was 16” by 9” and was done in black and white. 
Again the ads were quite similar with respect to all variables but basic 
appeal. All the finished ads were mounted on stiff, white cardboard. 
Series 3 was made up of four advertising appeals of themes of a popular 
brand of flour. These were basic themes or ideas which might later be 
used as a basis for the formulation of finished ads. Since the sales test 
to be used as a criterion was made with verbal presentation of the appeals, 
it was decided to record the appeals so that they could be presented to the 
subjects in a similar manner. The appeals were recorded by an an- 
nouncer with radio experience. The announcer was told to make each 
presentation as constant as possible. He was informed as to the nature 
of the experiment and told that we were interested in measuring the 
effectiveness of the basic theme or idea contained in the appeal and did 
not want effectiveness to vary as a function of the different qualities of 
his presentation. It is not possible to ascertain how well this purpose 
was accomplished, but of the several persons who have listened to the 
presentations, none have detected any bias in favor of any one appeal. 
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Procedure. Two rooms were used in this investigation. One room 
was used for the presentation of the advertising material to the subject, 
and the adjoining room contained the equipment for recording the psy- 
chogalvanic responses and the equipment for playing the recorded mater- 
ial. One experimenter was in the room with the subjects and gave in- 
structions and presented the material. The other experimenter was in 
the adjoining room and controlled the recording apparatus. The ex- 
perimenters were in contact with each other by means of a two-way 
signal system. 

The room in which the subject was seated was bare of distracting in- 
fluences as far as this was possible. The room was semi-sound-proofed, 
and although it did not keep out all sounds, it reduced the extraneous 
noises to a minimum. All daylight was excluded and the room was 
lighted by electric lights so that the light on the ads would be constant. 
The ads were presented on a stand which was adjustable for height and 
distance and were presented at eye level. A blank piece of white card- 
board covered the first ad and a similar piece separated each of the fol- 
lowing ads so that the experimenter could control the rate of presentation. 

When a subject arrived she was brought into the room, and the elec- 
trodes were fastened to her palm and arm. As most people have a 
distinct aversion to being shocked by an electric current, this disturbing 
influence was removed as far as possible by telling the subject that there 
was absolutely no danger of being shocked. The subject was told to sit 
relaxed and that all that was required of her was to look at and listen to 
the material as it was presented. She was told to look at the ads as if 
she were seeing them in a newspaper or magazine and to listen to the 
appeals as if she were hearing them over the radio or someone was saying 
them to her. 

Within any series, the ads and appeals were presented in a counter- 
balanced order, and the presentation of the series themselves was also 
counterbalanced. This procedure controlled position effects and the inter 
and intra series influences of an ad or appeal on another. 

The subject was allowed to relax for a period of three to five minutes 
after the completion of the instructions in order for her to get used to 
the situation. This tended to make the galvanic readings more stable. 
At a signal from the experimenter running the apparatus, the other ex- 
perimenter removed the first blank card thus exposing the first ad. In 
order to accustom the subject to this procedure, the first printed adver- 
tisement and recorded appeal were always “dummies’’ during which time 
no readings were taken. This also tended to make the galvanic readings 
more stable. The ads were presented for a 30 second period while the 
appeals lasted about 15 seconds. Between 30 and 45 seconds was allowed 
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between the presentation of the ads and appeals within a series and be- 
tween 45 and 60 seconds was allowed between each series. This interval 
depended upon the stability of the psychogalvanic readings at the time. 
The beginning and end of each exposure period was marked on the tape 
recording of the subject’s responses. 

Apparatus. The apparatus used in obtaining the galvanic readings 
was a two stage vacuum tube voltage amplifier with direct coupling. It 
was designed specifically for this type of research and this type of measure- 
ment. The apparatus has the advantage of ease of manipulation, ac- 
curacy in giving quantitative comparisons, and high sensitivity. An 
additional advantage was the obtaining of permanent records by graphi- 
cally recording the psychogalvanic responses by means of an Esterline 
Angus graphic recorder model A. W. 

Zine electrodes about one inch in diameter were used. These were 
attached by means of leather straps and sponge rubber between the 
electrode and the strap assured an even contact with the skin area. 
One electrode was attached to the palmar surface of the hand and the 
other to the inner surface of the forearm. Commercial electrode paste 
and jelly were used to facilitate contact with the subject’s skin area. 

The graphic chart of the recorder moved with a speed of three inches 
per minute and the magnified changes in the subject circuit were recorded 
on the moving chart by means of a writing mechanism. The machine 
was calibrated with a decade resistance box so that the recorded responses 
could be read off as changes in subject resistance. 

Criterion.2. The criterion used in all three series of ads and appeals 
was the results from sales tests conducted by the McCann-Erickson Ad- 
vertising Agency in Chicago. The purpose of these sales tests was in 
each case to analyze the relative sales effectiveness of the ads and appeals 
in question. 

The tests, in each case, were made through a study of the movement 
of store inventories associated with consumer exposure to the alternative 
advertising material studied. The studies were all conducted using 
stores situated in what were believed to be representative urban com- 
munities. In the consumer exposure to the various advertising materials, 
strict counterbalancing techniques were used. This tended to control the 
effect of random factors, biases from the cumulative impact of advertising 
exposure, and from the sequence of presentation of the various appeals. 

In all of the tests strict and rigid controls were used, therefore, since 
advertising was the major variable in the stores during the test, it is 
reasonable to assume that the differences in sales, revealed by the store 
inventories, was the result of advertising. 


*A more complete description of the criterion tests cannot be given due to the con- 
fidential nature of the techniques. 
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Of the many possible ways of evaluating the changes in resistance, 
only one was used in this study—the total log conductance change during 
the exposure to any ad. That is, the log conductance changes for each 
ad was summated giving a total arousal value. However no change was 
recorded unless there was at least 200 ohms of change and no differences 
between ads were recorded unless the change was 10% or greater. 


s 


Results 


The problem of this study was the relationship between the total 
arousal produced by the ad and its sales effectivness. If two ads had 
equal arousal they would produce equal log conductance changes or one 
would be greater in half of the cases and the other would be greater in the 
other half. Any variation from this one-to-one relationship could reason- 
ably be attributed to the greater efficiency of one appeal over the other.* 
The significance of any deviation from this ratio can be checked by the 
Chi square method. Table 2 gives the number of times each ad in each 
of the three series produced the largest arousal value and the Chi square 
values for these differences. 

From Table 2 we can examine each of the three series of ads. In the 
pancake flour ads it is apparent that ad A gave more high arousals than 
ad B. The chi square value of 3.26 would occur by chance not more than 
about seven times out of 100. The chi square of 3.78 between A and C 
would occur not more than about five times out of 100. The difference 
between B and C was insignificant. In the baby food series there were 
no significant differences between the two ads. There were likewise no 
significant differences in the flour ad appeals. 

These same data for the arousal value of the three series of ads were 
analyzed by another method. The smallest log conductance obtained 
for each subject was arbitrarily given a value of 0 and the highest value 
obtained a value of 10; other values were distributed between these ex- 
tremes. Table 2 gives the means for each ad in each series by this 
method. 

The difference between these means were checked for significance by 
the Fisher ‘“‘t” test. Table 4 gives the “t’’ value for each comparison 
for each of the three series of ads. 

The “‘t’”’ score between ads A and B for the pancake flour ads was 
1.60. This means that if no difference in effectiveness existed between 

* The authors are aware that other assumptions can be made about the distribution 
of the expected frequencies and the treatment of the cases in which no differences were 
found in galvanometric readings between the ads in a series. However, any method of 
calculation would give similar results and the method here used seems as defensible 
as any. 
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Table 3 
Mean Reactions for Each Ad 








Pancake Flour Baby Food Flour 
N Mean N: Mean N Mean 


46 5.50 48 3.73 48 1.31 
4.03 48 4.52 48 1.49 
46 3.98 48 1.28 


48 1.68 








the ads a “t”’ as large as this and in the same direction would be obtained 
in only about seven times out of 100 by chance. The ‘“‘t’’ value of .06 
between B and C was insignificant. 
Table 4 
“t” Test for Significance of Difference 








Pancake Flour Baby Food 
t t 





1.60 .90 33 
1.53 06 
.06 48 
58 

37 

.73 





The difference between the baby food ads would occur about 17 times 
out of 100 by chance and was therefore on the borderline of probable 
significance. None of the flour ads showed statistically significant dif- 
ferences. 

Both of the above types of analysis lead to the same general results. 
The results for the two methods can now be compared with the sales 
efficiency of the ads as a measure of the value of the galvanometric method 
of testing ads. 

Criterion Data. In the sales test conducted with the pancake flour 
ads, it was found that ad A sold 2.1 times as many packages of flour as 
did either of the other two ads. There was little difference between ad 
B and ad C. Ad A sold 100 units of flour, ad B 47 units, and ad C 
48 units. 

In the sales test on the baby food ads, no significant difference was 
found in the selling effectiveness of the two ads. Ad A sold 92 units and 
ad B sold 100 units. 

In the sales test conducted using the four advertising appeals or 
themes, it was concluded that there was a significant difference in the 
relative sales effectiveness of the four appeals tested. The differences 
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were small, however, and the advertising agency concluded, that for 
practical advertising purposes, the actual degree of difference was so small 
that any of the appeals could be used with about equal effectiveness. 
Appeal A sold 96 units to the people hearing its sales talk, appeal B sold 
100 units, appeal C sold 83 units and appeal D sold 90 units. 


Summary 


Close agreement was found between the galvanic changes produced by 
a series of pancake ads and the sales effectiveness of these ads. The scales 
effectiveness of ad A was 2.1 times as great as either ads Bor C. Little 
difference was found between B and C. Both the Chi square method 
and the “?’’ test indicated that ad A was “better” (galvanic responses) 
than the other two ads at the 7% level of significance. By the method 
described here, no attempt was made to determine how much A exceeded 
Band C. B and C were not significantly different in their galvanic re- 
sponses. 

The baby food ads had almost equal sales appeal. In their galvanic 
responses there was no statistically significant difference. 

The results are more equivocal for the four flour ads, although the 
sales tests showed statistically significant differences. These differences, 
however, were small and the advertising agency stated that for practical 
purposes the four appeals could be considered equal. The differences 
between the galvanic responses to these appeals were not statistically 
significant. 

In conclusion, it may be stated that this study adds positive evidence 
in behalf of the hypothesis that, under properly controlled conditions, the 
effectiveness of advertising material can be predicted by the psycho- 
galvanic method.‘ Further work is needed, of course, with different types 
of advertising material and with material of different degrees of effective- 
ness. However, the technique gives promise as an objective evaluation 
of ads and advertising appeals. 


Received December 15, 1947. 
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Book Reviews 


Adkins, Dorothy C., assisted by Primoff, Ernest S., and McAdoo, Harold 
L. of U. 8. Civil Service Commission, and Bridges, Claude F., and 
Forer, Bertram, of U.S. War Department. Construction and analysis 

of achievement tests. U.S. Government Printing Office, 1947. Pp. 
292. $1.25. 


In their volume on testing for human skills and capacities important 
to the public service Dr. Adkins and her colleagues have not only followed 
the canons of scholarship admirably but have also made the techniques of 
measurement clear for intelligent laymen and reasonably comprehensive 
for the specialist. Directed primarily to achievement as against aptitude 
testing, and to the prediction of job performance, it is the first volume of 
its kind with chief emphasis on the development of tests by and for public 
personnel agencies. 

Unlike most “practical” books this text is not superficial. Difficult, 
complex topics and techniques are not dodged, if they are necessary to an 
understanding of test development. Rather, they are faced squarely. 
They are, however, elaborated beyond the point necessary to the com- 
prehension of a trained specialist, as will be understandable. Technical 
terms are defined and explained in the context where they first arise and 
also in a full, detailed glossary. 

Extensive tryout of the materials in training courses has demonstrated 
that persons new to the field of testing can learn, with the aid of this text, 
to apply the concepts and methodologies germane to testing in the public 
service. For this reason, the volume should be invaluable for federal 
committees and boards of examiners functioning for departments of 
government under the policy of decentralized examining. 

College teachers in the field of tests and measurements will find this 
book a valuable adjunct to their reference library or their list of collateral 
reading. Among others to whom it will be useful are college placement 
and testing services, college departments engaged in large-scale examin- 
ing, and industrial concerns with well established or prospective per- 
sonnel testing programs. The tabular and graphical materials, and 
oftentimes the text itself, should prove a boon even to the sophisticated 
technician and researcher. 

Although theoretical questions are strictly excluded, adequate dis- 
cussion plus the necessary modus operandi of calculation are given for 
means, standard deviations, standard errors of differences, tetrachoric 
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and point biserial correlation, and multiple regression. Thirty-four 
tables simplify the machinery of statistical calculation and serve as an 
excellent step-by-step process to orient and inform the newcomer to the 
field and to furnish handy tools for the seasoned researcher. Twenty-two 
figures supplement the tabular material. Twenty-four “exhibits’’ make 
clear many practical applications of measurement and statistical meth- 
odology to problems of selecting personnel—trades journeymen, clerical 
workers, professionals. 

Dr. Adkins’ text should take a place among the signal and enduring 
contributions to the field. 


Fred S. Beers 
State Technical Advisory Service, 


Social Security Administration, 
Washington, D. C. 


Crawford, Albert B., and Burnham, Paul 8. Forecasting college achieve- 
ment. A survey of aptitude tests for higher education, Part I. General 
considerations in the measurement of academic promise. New Haven: 
Yale University Press, 1946. Pp. 291. $3.75. 


This book may be recommended to those interested in student per- 
sonnel work at the college level. It is concerned primarily with guidance 


of students into those fields of study in which they can be most successful. 
The framework of concepts and procedures basic to measurement and 
prediction of special abilities is presented in such a way as to be useful not 
only to technicians but also to administrators and faculty members in 
general. 

The book opens with an historical survey, elementary to the psycho- 
logist, but instructive to those in other fields. It includes clear definition 
of such concepts as aptitude, skill, and achievement, with examples to 
show how the tests operate. The difficulties inherent in aptitude testing 
are clearly presented, and theoretical methods of attacking the problems 
are suggested. 

Chapter two is a review of statistical principles involved in test work. 
It has value as an indication of the practical function of the statistical 
methods ordinarily taught in tests and measurement courses. The ma- 
terial quite naturally constitutes an argument in favor of advanced 
statistical courses as well. 

Chapter three contrasts the so-called “general intelligence test’’ with 
tests which are intended to measure several more or less independent 
capacities. Several of the widely-used tests for adults and college stu- 
dents are described, discussed, and criticized. One can agree with most 
of the criticism directed against the few tests available for use at the 
college level. 
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Chapter four is a review of achievement testing, indicating a trend 
toward the measurement of higher and more complex thought processes. 
In comparing essay-type and objective examinations, attention is given 
to the means for eliminating or minimizing the alleged weaknesses of the 
latter. The discussion is centered around a few well-known, large-scale 
testing projects which have provided instruments for use with college 
students. Included are basic facts concerning the degree of success. 
achieved by present methods for predicting college grades. 

Chapter five presents as a sample aptitude battery certain tests used in 
studies at Yale University. Methods, techniques, and results achieved 
in differential prediction of success in the liberal arts, in pure science, and 
in applied science, are discussed. The data are valuable, and the theo- 
retical implications are significant. 

Chapter six is a discussion of the theory of factorial analysis, and a 
presentation of some results secured by such factoring methods. Em- 
phasis is placed upon the Thurstone studies of Primary Mental Abilities. 
Crawford and Burnham indicate rather definitely that tests based upon 
factor analysis are of less practical value in guidance than are measures 
obtained by the older methods for development of aptitude tests. 

The last chapter is a review of test construction, with special em- 
phasis upon the measurement of idiosyncrasies. The procedures essen- 
tial to effective construction of such tests are described and explained. 
The discussion emphasized methods used by the College Entrance Ex- 
amination Board. An interesting detail is the fact that some of the 
methods developed for use with new-type tests have been applied to tests 
of the essay type. 

The appendices include some tables of statistical data, and some 
sample items suggesting the mental processes involved in the Yale battery 
of educational aptitude tests. 

The book furnishes clear insight into the activities characteristic of 
measurement in modern educational guidance. One need not accept all 
its conclusions; one can, for example, reconcile the views of the workers 
who still respect the IQ with the views of those who desire more analytic 
measures. The reviewer disagrees with the authors on several minor 
points, but finds the work as a whole characterized by sound judgment 
and good common sense. The book can be a very useful reference work 
for teachers of educational guidance, statistics, and tests and measure- 
ments. It is an important book for personnel workers interested in the 
selection and training of students in professional schools. 


Harold D. Carter 


University of California 
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Braun, Carl F. Fair thought and speech. Alhambra, California: Pri- 


vately published, 1947. Pp, 50. $1.25 single copy. $1.00 twelve 
or more. 


The keynote of “Fair Thought and Speech” is to be found in an in- 
troductory quotation from John Ruskin, ‘‘We require from men two kinds 
of goodness: first, the doing of their practical duty well; then that they 
be graceful and pleasing in doing it; which last is itself another form of 
duty.” The major part of the book is devoted to a discussion of the 
means whereby men can be graceful and pleasing in doing their practical 
duty well. 

The text of the book is a letter which the author, as president of a 
manufacturing firm wrote, as part of a series, to each of his employees. 
The principles which the book enunciates are designed to apply with equal 
force to everyone,—to leader and workman alike. A fair indication of 
the content and general tone of the material is to be found in the chapter 
headings. Representative of these are: ‘Don’t Act Superior,” “Don’t 
Question too Fiercely,” “Don’t Be too Positive,” “Don’t Be Stiff- 
necked,”’ ‘Don’t Be a Worm,” ‘“‘Don’t Be Unfair,” ““Don’t Snap, Don’t 
Scowl,” “Ego,” and “Concession.” As may be readily observed, the 
author might better have used positive suggestion rather than negative 
suggestion in his approach. 

It is very evident that the author is making a sincere attempt to apply 
Christian principles to business intercourse. In fact, he makes frequent 
use of Biblical quotations to support his thesis. Typical of these are 
“A soft answer turneth away wrath; but grievous words stir up anger— 
Proverbs 15:1”; “Sweet language will multiply friends; and a fair- 
speaking tongue will increase kind greetings—Ecclesiastes 6:5’; and 
“Can a man take fire in his bosom, and his clothes not be burned?— 
Proverbs 6:27.” 

The book is simply and clearly written with short sentences and para- 
graph headings to emphasize and drive home the author’s message force- 
fully. It is evident that the author especially feels the need for tolerance 
in human relations as he makes an outstanding plea for it under the cap- 
tion ‘Looking Down,”’ as follows: 


“Let’s not set ourselves above others. Let’s not think or talk 
about people Below us or Under us. Let’s say, With us or Around us. 
Let’s not spread information Down, but rather Out. Let’s not Hire 
people, but rather Take them on or Have them join us. . . . Let’s not 
talk of Superiors, but of Leaders. Let’s not speak of Telling people, 
but rather of Asking. Let’s have no talk of, I am better than thou.” 


Mr. Braun’s basic philosophy is clearly stated in his final chapter, 
“By Little and Little,” as follows: ‘‘Little Drop: In human relations, per- 
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haps more than in other things, success or failure is made up of little 
things. A friendly word a day will do the trick—will build success. An 
illiberal word a day, even one a week, will do the trick too, will dig a pit 
for any man. [Illiberal words, missile words, condescending words, slip- 
pery werds, sly words—let’s drive them completely out of our thoughts 
and speech. ‘Weight thy words in a balance, and make a door and bar 
for thy mouth.—Ecclesiastes 19:1.’” 

The sincerity of the author’s purpose will be evident to every reader. 
It is only to be hoped that Mr. Braun’s employees and the others to whom 
the book is directed will accept his words in the same spirit in which he 
has written them. 


Robert N. McMurry 
Robert N. McMurry & Co., 


Chicago, IUinois 


Churchman, C. W., Ackoff, R. L., and Murray, Wax. Measurement of 
consumer interest. Philadelphia: University of Pennsylvania Press, 
1947. Pp. 214. $3.50. 


A conference on the measurement of consumer interest was called by 
a group of University of Pennsylvania philosophers with the coordination 
of research as its objective. This book presents a record of the pro- 
ceedings. 

The section on problems in practice covered a variety of somewhat 
unrelated topics in a fairly informal way: exaggerated responses in polling 
(Crossley); preference and performance (Preston); the research client as 
a problem (Blankenship); the problem of getting people interested in be- 
coming more efficient consumers (Doubman); the use of call-back in- 
terviews (Stock); and the researcher as a problem (Ellis). 

In contrast, the section on ways of evaluating preferences covered 
fewer problems but dealt with each one much more thoroughly and for- 
mally. Thurstone presented several theorems on the prediction of the 
frequency of first preferences when the scale values and discriminal dis- 
persions are known for each stimulus and developed a method of com- 
putation. Of special practical importance is his estimation formula which 
is restricted neither by the shapes of the affective distributions nor by 
their intercorrelations, for use in connection with the method of successive 
intervals. 

Irwin described several experiments to illustrate how preferences are 
affected by factors other than the physical characteristics of the objects. 
These illustrations should be studied carefully by any one who attempts 
to measure preferences or to interpret reasons which respondents give 
for their preferences. 
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Guttman gave a step by step description of the use of the Cornell 
technique for scalogram analysis. In addition to the technique of con- 
tent analysis he described two methods of intensity analysis: the ‘‘fold- 
over” and “two-part” techniques. The description is detailed enough 
so that any one could employ these techniques in his own field. Even 
people who refuse to accept the specific techniques will have to admit 
that a real contribution has been made by proposing a solution to such 
problems as biases in question wording and determining whether an 
attitude is scalable. 

The wide area which the conference attempted to cover is illustrated 
by the banquet address on the meaning of consumer interest. This was 
a discussion of the consumer movement. 

The section on the meaning of measurement became more philo- 
sophical. Singer struck the keynote of the conference by supporting 
cooperation rather than isolation. Deming distinguished between “‘quali- 
tive” and “quantitative” surveys and set up criteria for a satisfactory 
statistical program; and Churchman discussed the consumer and his 
interests. 

Perhaps no section supported the theme of the conference, the need 
for cooperative research, mure strongly than the discussion of specifi- 
cations for consumers’ goods. Wilks and Peach presented a stronger case 
with specific examples than could have been presented with many more 
words in the form of generalities. In addition, the topic was well handled 
by Breyer, Curtiss, and Palmer, as well as by Wilks and Peach. 

The discussion of sampling techniques produced the usual points and 
issues. This would be expected since the sampling problems in the meas- 
urement of consumer interests are about the same as the sampling prob- 
lems in any other consumer field. 

The section on the application of the measurement of attitudes con- 
sisted of illustrations from two fields. Viteles discussed the measurement 
of employee attitudes and went far beyond the mere listing of the results 
of attitude surveys. He also pointed out how they should be combined 
with the results of direct experimentation and with other types of infor- 
mation in reaching practical conclusions. Cartwright gave a nontech- 
nical description of the research program used to guide the War Loan 
Drives. 

Evidently the conference accomplished a number of things, some of 
them probably unintentionally. It established the need for greater co- 
operation in research. It demonstrated that the field of “the measure- 
ment of consumer interest”’ is loosely defined by covering a range of un- 
related topics many of which would be considered irrelevant to the topic 
by most people. It highlighted the terrific gaps in our knowledge as far 
as this field is concerned. 
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At any rate, the proceedings are well worth reading. They bring 
together in convenient form material from a number of fields and view- 
points, and thus they provide a many-sided sketch of this important field. 


Alfred C. Welch 
Knoz Reeves Advertising, Inc., 


Minneapolis, Minnesota 


Moncrieff, R. W. The chemical senses. New York; John Wiley and 
Sons, 1946. Pp vii + 424. $4.50. 


In comparison with vision and hearing, man’s chemical senses contri- 
bute little to his intellective activities. However, in terms of personal 
adjustment and social intercourse their place is by no means a lowly one. 
The use of perfumes is an old art. As far as the “stronger” sex is con- 
cerned, the satisfaction of the palate would rank well with sex in marriage. 
The chemical senses are entangled in a variety of ways in the economic 
and political struggle. In the business world, brand loyalties created by 
using a specific shade of flavor represent tremendous economic assets. 
Such seemingly unassuming problems as packaging of food, changes in 
flavor with storage, and rancidity of fats are actually billion-dollar ques- 
tions. The irritant gases are of both industrial and military concern. 

In view of these facts it is somewhat surprising that standard texts of 
applied psychology barely touch on any of these topics. Thus Poffen- 
berger’s Principles (1942) do not even include in the index such headings 
as taste or gustation, and olfaction is mentioned briefly in connection 
with the use of psychological testing techniques for medical diagnosis. 

Moncrieff’s aim was to coordinate the data »f physiological psychology 
and of chemistry bearing on the theory of chemical senses, and to present 
data which would be useful to individuals concerned with such problems 
as manufacture of perfumes and food production. Psychologists deal- 
ing with flavor and odor will find in Moncrieff’s book a valuable pref- 
ace, an “‘Einleitung”’ to this complex field. 

There is a glossary of over 300 terms, an extensive author index in- 
cluding not only the name and page but also the topic in connection with 
which an author is being cited, and an excellent subject index containing 
some 4000 entries. The strong point of the book is the chemical treat- 
ment of the subject. The text on psychology of the chemical senses, 
particularly on applied psychology, is yet to be written. 

Josef Brozek 


Laboratory of Physiological Hygiene, 
University of Minnesota 
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Books, monographs, and pamphlets for listing and possible review should be sent to 
Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


Counseling employees. Earl M. Bowler and Frances T. Dawson. New 
York: Prentice-Hall, Inc., 1948. Pp. 247. $4.00. 

Psychology and military proficiency. Charles W. Bray. Princeton: 
Princeton University Press, 1948. Pp. 242. $3.50. 

The magic of believing. Claude M. Bristol. New York: Prentice-Hall, 
Inc., 1948. Pp. 245. $2.95. 

Applied psychology. Harold E. Burtt. New York: Prentice-Hall, Inc., 
1948. Pp. 821. $7.35. 

Public opinion and propaganda. Leonard W. Doob. New York: Henry 
Holt and Co., 1948. Pp. 600. $4.00. 

A human relations casebook for executives and supervisors. Frances and 
Charles Drake. New York: McGraw-Hill Book Co., Inc., 1947. 
Pp. 187. $2.50. 

The labor force in the United States 1890-1960. John D. Durand. New 
York: Social Science Research Council, 1948. Pp.302. $2.50. 

Emotional problems of living. O. Spurgeon English and G. H. J. Pearson. 
New York: W. W. Norton and Co., Inc., 1948. $5.00. 

Clerical salary administration. Leonard W. Ferguson, Editor. New 
York: Life Office Management Association, 1948. Pp. 220. $4.00. 

Sickness absenteeism among male and female industrial workers, 1937-1946, 
inclusive. W. M. Gafafer. Washington, D. C.: Superintendent of 
Documents, U. 8. Government Printing Office, 1947. Pp. 4. $.05. 

Guide to occupational choice and training. Walter J. Greenleaf. Wash- 
ington, D. C.: Federal Security Agency, Office of Education, 1947. 
Pp. 150. $.35. 

Shakespeare's Hamlet. Ernest Jones. New York: Funk and Wagnalls 
Co., 1948. Pp. 180. $2.50. 

Principles of personnel testing. C.H. Lawshe, Jr. New York: McGraw- 
Hill Book Co., Inc., 1948. Pp. 227. $3.50. 

An introduction to clinical psychology. L. A. Pennington and I. A. Berg, 
Editors. New York: The Ronald Press Co., 1948. Pp. 595. $5.00. 

Psychology and life. Third edition. Floyd L. Ruch. New York: Scott, 
Foresman and Co., 1948. Pp. 782. $3.60. 

Evaluation of group guidance work in secondary schools. Georgia M. 
Sachs. Los Angeles: University of Southern California Press, 1948. 
Pp. 120. $2.50. 

The unfolding of artistic activity. Henry Schaefer-Simmern. Berkeley: 
University of California Press, 1948. Pp. 202. $5.00. 








434 New Books, Monographs, and Pamphlets 


Psychology for living. Herbert Sorenson and Marguerite Malm. New 
York: McGraw-Hill Book Co., Inc., 1948. Pp. 637. $3.00. 

Difficulty prediction of test items. Sherman Tinkelman. New York: 
Bureau of Publications, Teachers College, Columbia University, 1947. 
Pp. 55. $1.85. 

Social psychology. Wayland F. Vaughan. New York: The Odyssey 
Press, Inc., 1948. Pp. 956. $5.00. 

American Psychological Association 1948 directory. Helen M. Wolfle, 
Editor. Washington, D. C.: American Psychological Association, 
1948. Pp. 429. $3.00. 

Building self-confidence. C. Gilbert Wrenn. Stanford: Stanford Uni- 
versity Press, 1947. Pp. 32. $.35. 

A 1948 survey of office salaries. American Business Report. Chicago: 
Dartnell Publications, Inc., 1948. Pp. 64. Report included with 
subscription to American Business at $5.00 for 15 issues. 

Developing public and industrial relations policy. General Management 
Series No. 140. New York: American Management Association, 
1947. Pp. 52. $1.00. 

Employee counseling services. Selected References No. 20. Princeton: 
Princeton University Industrial Relations Section, 1948. Pp. 4. $.10. 

Plan for action work kit, including report on employee opinion surveys. 
New York: Joint Committee Headquarters, Room 1750, 420 Lexing- 
ton Ave., 1947. $5.00. 

Labor market information (area series). Washington, D. C.: Superinten- 
dent of Documents, U. 8. Government Printing Office, 1948. Issued 
monthly, $2.50 a year. 

Labor market information (industry series). Washington, D. C.: Super- 
intendent of Documents, U. 8. Government Printing Office, 1948. 
Issued monthly, $1.00 a year. 

Lighting schoolrooms. Washington, D. C.: Superintendent of Docu- 
ments, U. 8S. Government Printing Office, 1948. Pp.17. $.10. 

Principle of equalization applied to the allocation of grants in aid. Wash- 
ington, D. C.: Superintendent of Documents, U. 8. Government 
Printing Office, 1948. Pp. 225. $.75. 
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